openEHR Terminology

Introduction

The openEHR Terminology is a simple terminology that includes all the terms found in the terminology specification. In the openEHR terminology specification, there are two kinds of vocabularies - code-sets, where the codes stand for themselves (includes ISO 3166 & ISO 639 codes, IANA MME types etc), and term sets, where each term has a numeric code, and has a description that can be translated into multiple languages.

Current Status

The openEHR terminology is being used by the Java project, by the Archetype Editor, and by various other tools.However, different tools use different source files, a situation which we are trying to rectify. The two representations are as follows:

The 'java terminology' - the Java project tools now use the terminology files in the knowledge2 SVN repository. These files are designed as follows:
- structured as 2 files, one for code sets and one for the openEHR term sets
- each file pair covers one language only - translating the terminology means copying and renaming the files and translating the contents.
The 'AE terminology' - Archetype Editor uses a differently structured file , which is based on this XML schema. The characteristics of this file are:
- one file for all translations
- includes significant amounts of UI 'terms' specific to the Archetype Editor UI, i.e. the file contents are not limited to the openEHR terminology.

Issues

The currently known content issues with the above situation are:

The AE terminology file contains numerous Archetype Editor GUI elements which do not belong in the openEHR Terminology. In its current state, it is not an appropriate file to publish as the openEHR terminology.
managing translations in the AE terminology is clumsy, because a) every time a new language is added, the original file has to be modified, b) the file keeps growing in size and c) it is not easy to see if any given translation is complete, because the file needs to be cut up to compare language sections to each other.
The codes in the openEHR code set 'media types' are not literally used, because they are just a placeholder for IANA types like 'text/plain' etc, whose real definition is at http://www.iana.org/assignments/media-types/index.html .
Computable files for ISO 639 (language names) should similarly not be provided by openEHR, although it is still not clear where they should come from. The official Library of Congress ISO 639 page is here - however no computable file is provided.
Computable files for ISO 3166 (country codes) are actually available here , in TXT and XML format.
Unit types:
Currently there is no defined process for obtaining a new code

Recommendations

Based on the problems above, the following recommendations have been made by an initial analysis group.

Adopt the 'java files' as the basis for going forward
Separate out the IANA codes

Another possible solution might be to define the terminology in the same format as SNOMED, which would make it palatable to SNOMED tools. This may provide useful answers for translation representation.