Currently, most 'language' attributes in openEHR and AOM are specified as either 'ISO-639' or 'openEHR languages code set', which is ISO-639-1.
We should be more precise about this, because generally implementations already at least handle xx-xx language tags, of the form ISO 629-1 2 char code, optionally followed by '-' and then 2 char region code, e.g.
'en-gb'. If we look at the RFC grammar (https://tools.ietf.org/html/rfc5646#section-2.1), we see that the first rules are:
langtag = language
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved
script = 4ALPHA ; ISO 15924 code
I would propose that we at least support language[-script][-region], which really means 2*3ALPHA[-extlang][-script][-region].
Or we might say that we only guarantee 2ALPHA-region, i.e. 'en-gb', 'fr-ca' etc.