Specify language attributes as being coded by a subset of IETF RFC 5646 language tag standard.


Currently, most 'language' attributes in openEHR and AOM are specified as either 'ISO-639' or 'openEHR languages code set', which is ISO-639-1.

We should be more precise about this, because generally implementations already at least handle xx-xx language tags, of the form ISO 629-1 2 char code, optionally followed by '-' and then 2 char region code, e.g.

'en-gb'. If we look at the RFC grammar (https://tools.ietf.org/html/rfc5646#section-2.1), we see that the first rules are:

langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]

language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag

extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved

script = 4ALPHA ; ISO 15924 code

I would propose that we at least support language[-script][-region], which really means 2*3ALPHA[-extlang][-script][-region].

Or we might say that we only guarantee 2ALPHA-region, i.e. 'en-gb', 'fr-ca' etc.


Thomas Beale

Raised By

Thomas Beale


Affects versions