Language as String


Current Situation
----------------- These particular attributes, which occur, for example in the class DV_TEXT (see use international standard codesets as follows:

  • language is represented by a CODE_PHRASE object containing a code-set id for openehr-languages, which is the same as ISO 639 2-character language codes

  • encoding is similarly represented, but using codes from IANA character sets, see

  • territory is similarly represented, using ISO 3166 2-character country codes

All three of these codesets are currently 'wrapped' by openEHR code-sets (see Support IM,, and it is the openEHR code-sets which are mentioned in the reference model invariants, thus forcing the appropriate attributes always to be a code from the appropriate code set. This level of indirection allows for openEHR to, in the future, use different code sets for this purpose (e.g. the ISO 3-character code sets, or perhaps an ISO replacement for the IANA charater set names, or even IANA equivalents for the ISO code sets); the reference model would remain valid regardless.

The logic for choosing to model these codes as CODE_PHRASEs in openEHR was for consistency: every coded entity in openEHR is either a DV_CODED_TEXT (which contains a CODE_PHRASE) or a CODE_PHRASE (used when the codes themselves carry the meaning, as most of the ISO and IANA codesets do). IN practical terms it does of course mean slightly more data instances at a fine-grained level; e.g. in XML you would see more tags and data items for each CODE_PHRASE compared to a simple String field.

Proposed Situation
------------------- Sam Heard has proposed that these three types of codes should be hard-wired into the reference model - as direct string attributes, and that the reference model documentation should simply say that the particular ISO or IANA codes are mandatory in each case.

This is a reasonable position - these codesets seem to be very stable - some would say they are the most stable of any coded entity today. There is undoubtedly software around which does hardwire such codes, and has never had a problem. There is also an argument for simpler object structures as well - a String is simpler than a CODE_PHRASE. However, semantically, the current and proposed solutions are the same - in the current situation, invariants guarantee the the codes must come from the appropriate codesets for each particular attribute.

Possible objections are:

  • the indirection we currently have is useful: there is no guarantee that we won't have to move to another code-set which better serves the same purpose

  • the consistency in the software (all coded entities are always dealt with via the terminology service, no matter what they are) is preferable to having certain fields that the software itself directly knows the codes of





Raised By