Skip to end of metadata
Go to start of metadata

Introduction

The openEHR Terminology is a simple terminology that includes all the terms found in the terminology specification. In the openEHR terminology specification, there are two kinds of vocabularies - code-sets, where the codes stand for themselves (includes ISO 3166 & ISO 639 codes, IANA MME types etc), and term sets, where each term has a numeric code, and has a description that can be translated into multiple languages.

Current Status

The openEHR terminology is being used by the Java project, by the Archetype Editor, and by various other tools.However, different tools use different source files, a situation which we are trying to rectify. The two representations are as follows:

  • The 'java terminology' - the Java project tools now use the terminology files in the knowledge2 SVN repository. These files are designed as follows:
    • structured as 2 files, one for code sets and one for the openEHR term sets
    • each file pair covers one language only - translating the terminology means copying and renaming the files and translating the contents.
  • The 'AE terminology' - Archetype Editor uses a differently structured file , which is based on this XML schema. The characteristics of this file are:
    • one file for all translations
    • includes significant amounts of UI 'terms' specific to the Archetype Editor UI, i.e. the file contents are not limited to the openEHR terminology.

Issues

The currently known content issues with the above situation are:

  • The AE terminology file contains numerous Archetype Editor GUI elements which do not belong in the openEHR Terminology. In its current state, it is not an appropriate file to publish as the openEHR terminology.
  • managing translations in the AE terminology is clumsy, because a) every time a new language is added, the original file has to be modified, b) the file keeps growing in size and c) it is not easy to see if any given translation is complete, because the file needs to be cut up to compare language sections to each other.
  • The codes in the openEHR code set 'media types' are not literally used, because they are just a placeholder for IANA types like 'text/plain' etc, whose real definition is at http://www.iana.org/assignments/media-types/index.html . However, there appears to be no standard file obtainable.
  • Computable files for ISO 639 (language names) should similarly not be provided by openEHR, although it is still not clear where they should come from. The official Library of Congress ISO 639 page is here - however no computable file is provided.
  • Computable files for ISO 3166 (country codes) are actually available here , in TXT and XML format.
  • Unit types: TBD
  • Currently there is no defined process for obtaining a new code

Recommendations

Short Term

Based on the problems above, the following recommendations have been made by an initial analysis group (Rong Chen, Heath Frankel, Sebastian Garde, Thomas Beale):

  • Adopt the java files as the basis for going forward.
  • Adopt the design approach that openEHR will actually need to create its own 'code-set' files for external code-sets in a standard format based on internet available lists and pages from IANA, ISO etc.
    • Define an XML 'adjunct format' for this type of file
    • Remove the IANA codes from the existing 'external_terminologies' file, and create a separate file which is derived from the iana.org link above.
    • Remove the ISO 639 codes from existing 'external_terminologies' file and create a new adjunct format file derived from the above Library of Congress ISO 639-1 page.
    • Remove the ISO 3166 codes existing 'external_terminologies' file and obtain and convert the ISO 3166 file to a separate file in the adjunct format
  • Units: TBD

Obtaining a new code:

TBD

Future

A long term solution most likely involves discussion with IHTSDO in order to determine their coverage of the code sets.

Secondly, we should consider SNOMED CT style representation, which is to say, 3 separate tables as described in the IHTSDO TIG section on RF2 . This approach would enable each separate vocabulary in openEHR to be managed as a small hierarchy of its own. In order to use actual SNOMED CT codes, we would need to obtain an openEHR Snomed Extension .

  • No labels

6 Comments

  1. Thanks for putting this up, Tom!

    Regarding your questions:

    1. I would prefer to have separate files for each translations. Part of the reason is for easy maintenance, so the stable ones like the English translation won't get changed accidentally when new translations are introduced.

    2. The XSD file of the current terminology XML is attached here: http://www.openehr.org/wiki/download/attachments/5996909/openehr_terminology.xsd

    3. Any XML editor would suffice. The ones I am using now are the Altova XMLSpy 2007 and the built-in XML editor in Eclipse 3.4.

  2. I have developed a simple and RESTful terminology server with Ruby on Rails.
    If you would like to get a terminology value with terminology name,language, and ID, access with this format with HTTP get method, returns simple XML

    http://server address/terminology/name/lang/id

    For example,
    http://ts.openehr.jp/terminology/openehr/en/0

    returns:
    <terminology>
    <openehr-id>0</openehr-id>
    <rubric>"self"</rubric>
    </terminology>

    Notice:
    Since this is a experimental implementation, you can query only by id with 'openehr terminology' in 'en language'.
    Please comment this simple terminology server.

  3. Here are some thoughts to make an usable multilanguage openehr terminology server:
    - If we need codes it is important to define the domain of the codes (like the actual groups or codesets in openehr_terminology_en), the codesystem of the codes (like the actual external_id in code sets), but the description must be a code, so you can specify the translation of the descriptions on other files, and we can reuse the main terminology files in all our systems.
    - If we have concepts in groups, the rubric also must be a code, so we can define the translations of the rubrics on other files, and we can define sinonims to these rubrics in the same translation file. Attached to the rubric we can also have transated descriptions.

    I'd know what you think.

    Example of the ideas:
    openehr_terminology.xml

    <terminology version="1.0" status="stable">
       <codeset domain="ISO_3166-1" issuer="ISO" description_id="cd00000">
          <code value="AF" description_id="cd00001"/>
          <code value="AX" description_id="cd00002"/>
          <code value="AL" description_id="cd00003"/>
          ...
       </codeset>
       <group description_id="cd10000>
          <concept code="249" description_id="cd10001>
          <concept code="250" description_id="cd10002>
          <concept code="251" description_id="cd10003>
          ...
       </group>
    </terminology>

    openehr_terminology_description_en.xml

    <descriptions language="en">
       <description id="cd00000" value="ISO country codes" />
       <description id="cd00001" value="AFGHANISTAN" />
       <description id="cd00002" value="Ã...LAND ISLANDS" />
       <description id="cd00003" value="ALBANIA" />
       ...
       <description id="cd10000" value="audit change type" />
       <description id="cd10001" value="creation">
          <term code="249.1" value="creation type" /> <!- terminology sinonim to the concept with id 249 ->
          ...
       </description>
       <description id="cd10002" value="amendment">
          <term code="250.1" value="amendment type" />
          ...
       </description>
       <description id="cd10003" value="modification">
          <term code="251.1" value="modification type" />
          ...
       </description>
       ...
    </description>

  4. I would also prefer to separate translations into separate files.

    As Rong says, there is the variety of XML Editors which can be used to maintain this.

    As for the missing translations for the approach used by the Java project, I am happy to provide the German one, if this is to become the standard structure.

    What is clear is that we need to move to a single source of truth that

    • can be maintained easily,
    • can be translated easily and safely and
    • is either usable directly by tools in different programming languages or can be converted to a format usable for that language more or less automatically.

    At the moment, the process to include new codes is via making a Jira request to include a new code in a new release, see e.g. http://www.openehr.org/issues/browse/SPECPR-39
    This may simply be too slow in practice.

    I see that the Archetype Editor uses a terminology file that includes not only the official openEHR codes, but also codes only relevant for the Archetype Editor tool and codes that probably should be official codes (see e.g. the Jira issue above), but didn't make it into the official openEHR terminology (so far). Not sure this mixture is desirable.

  5. One issue I have with the current java implementation file structure is that it defines the codes in each group of openehr terms, some terms exist in more than one group and the term is redescribed in each group. This can cause conflicts as has been the case in the past. I think a schema with a section defining the terms and another section defining the group membership would ensure referential integrity, but certainly does make the file more complex for editing. The interesting thing is that the group membership only needs to be defined once and only the term descriptions need to be translated leading to the possibility of a separate file for group memberhship.
    It should be noted that if we consider the CTS2 CodeSystem and ValueSet entities, the code definitions represent a code system, while the group membership is a valueset. I currently have a schema that represent these two CTS2 entities but are unlikely to be the same as the CTS2 OMG technical specifications, I would be happy to offer these schema (although the code system schema is more complicated than is probably necessary as it supports multiple languages and additional CTS feastures) but using the CTS2 technical specifications should be the first preference.

  6. - -

    Some requirements which are borrowed from HL7 (it is fair game, I am going to use a lot of openEHR data structures in creating the HL7 Detailed Clinical Models DSTU!) and which are part of the CTS2 specification:

    Need to consider the use of compositional grammars, such as is possible with SNOMED-CT .

    Based on my experience with SNOMED, I would suggest that the display names/surface forms/terms be kept separate from the actual list of concepts, and only used when there was a need to display to a human.  You can then have multiple types of terms, and support multiple languages, for any given concept.

    I would also suggest maintaining a table of relationships between concepts.  This is key if you want to do any machine processing of the terms.

    The other (cooler) option would be to keep them in a triple store (see Virtuoso's community open source edition it is what is behind the KDE semantic desktop.)

    If you are using SNOMED, you probably want to define the description logic which defines your value set as well. 

    If you cannot create your value set this way, you probably have some problems in the model.  In any case, you will need to pick through the transitive closure of that expression and make sure there are not any "stinkers" (codes in there that just make no sense to be there) or abstract terms (useful for defining the hierarchy, not so much for actual values in a running EHRS).

    Testing for equivalence in SNOMED-CT (or similar terminologies) is a challenge. 

    When I have implemented this (Java) in the past, I have always created semi-normative (concept model) forms for each concept, even if represented by a single code.  It made for a lot of internal work (i.e. every concept needed to be looked up to define its internal structure). That way, "post coordinated" and "pre coordinated" ended up with comparable classes. 

    You then want to test for symmetric subsumption.  I.e.  

    public boolean equals(Concept c) {
    
         if (c==null) return false;
    
         if (c==this) return true;
    
         return this.isA(c) && c.isA(this);
    
    }
    //  isA(Concept c) is left as an exercise for the reader--will be different for different implementations.
    

    This assumes you have a good 'isA()' method.  :)

    But either go with RDF-S or separate the terms from the concepts. Also remember in any given language you may have multiple terms for the same thing, and dialects also become important (en-US v. en-GB v. en-AU).  You may have some preferred, close-to-user forms based on the type of user (patient v. physician v. tech).  So plan on some additional metadata for each term.  Keep in mind capitalization (is it fixed or is it based on usual sentence case grammar per language) as well as how to represent the term in systems which only support ASCII or fonts which lack the Unicode glyphs.  Some languages also have multiple alphabets in use. 

    The concepts and their relationships is the easy part.  Getting the human language right is tricky!