ISO 13606 2012 revision openEHR proposal

Introduction

This page is for a proposal (or more than one) for the revision of ISO 13606 in 2012. One possibility is to develop a formal openEHR Foundation proposal to CEN and ISO, containing a fully worked up description of changes. The goal would be to have a single standard for an EHR and EHR Extract reference model (RM), a revised archetype specification, and a single methodology for building software and tools, regardless of which RM variant was in use.

There is a dedicated mailing list for discussions about ISO 13606 and openEHR.

ISO 13606 part 1 - EHR Extract reference model

Current status and issues

Part 1 of the ISO 13606 standard has not changed since 2007 when it was accepted by CEN. It was derived from the combination of the earlier revision of the standard and the emerging openEHR information models of the time. A number of issues with its Extract model have been identified by users over the years. In openEHR, some 77 CRs relating to its RM have been processed in this time, which gives an idea of the number of lessons learned and the amount of evolution likely to be needed.

One thing that is clear so far is that 13606 has never been implemented literally according to the standard, but instead, in a variety of locally customised forms, each specific to the needs of the implementers. This essentially means that the current form of the standard is a 'useful starting point' but is not in fact acting as a proper standard in the sense of interoperable software or data.

With respect to the published form of the standard, a key issue historically has been the lack of a normative XML schema or other computable expression of the standard. This has resulted in a number of custom schemas being created in different locations.

Some of the major technical issues include:

  • the demographic model - this is a hard-wired model, not archetypable in a practical way, and appears to have found limited use;
    • archetyping is prevented primarily because none of the demographic classes inherits from RECORD_COMPONENT in the way that openEHR archetypable classes always inherit from LOCATABLE, which provide the meta-data attributes (archetype node id etc) enabling archetyping to work.
  • the EHR Extract model (the main part of the model) accommodates the data of only one patient. This means an Extract containing e.g. lab results for multiple patients is not expressible.
  • there is no clear way to include demographic entities once within an extract that can be referred multiple times from the clinical content;
  • it remains unclear what the data types specification to be used with the main model is. If it is intended to be ISO 21090, a 'profile' will be needed (see here for some discussion of this).
    • There is a fundamental problem with the 'profile' approach used in ISO 21090, in that it is subtractive, meaning that the desired model is derived from the published model by the removal of unwanted attributes. In a world with multiple users of this standard, each creating their own custom profile, non-interoperability is a virtual guarantee. Other problems with 21090 include over-complexity.
    • An alternative data types proposal for 13606 that could make sense is the new HL7 Fast Health Information Resources (FHIR), designed by Graham Grieve. This is in early development, but doesn't suffer from the profiling problem or over-complexity.
  • it is not clear how the rc_id attribute can be practically used in real systems.
    • Even if UUIDs were in use on all data nodes in existing EHR systems, it would not be possible to set this attribute in all cases, since it has to be set on nodes in the 13606 data model, which will differ from the original system model. The sheer space impost of UUIDs on every atom of data is unlikely to make this an attractive option.
    • If on the other hand the guidance in the standard is followed, and UUIDs are instead generated for each node in the extract, the source system would have to now have a separate database simply for the purpose of mapping between these rc_ids and its internal way of identifying data elements. If a lot of 13606 Extracts were created, this is likely to become a serious impost on each system because of the number of ids required. For example, 1m EHRs of av 60 Compositions, av 3 versions = 180m Compositions - call it 200m. Let's say the average amount of data per Composition is 50 nodes; we now have 200m x 50 = 10,000m, or 10bn possible nodes. Now, let's assume that only 10% of the data are ever requested as 13606 Extracts; we are back to a database of 1bn cross-references, or 1000 x-refs per EHR on average. This is likely to significantly complicate EHR system implementations.

There are also a number of minor technical issues, to do with various attributes:

  • in RECORD_COMPONENT:
    • sensitivity: xxx
    • synthesised: xxx

Any new proposal needs to address these in an integrated fashion.

Proposals for revision

General architecture

One proposal is to redefine the reference model to be more comprehensive than it is today, but with the possibility of enabling a choice of semantic richness, with fully defined mappings between parts of the model which are more semantically rich (the openEHR flavour) and less so (the 13606 flavour). Further, the openEHR EHR Extract and Demographic models could be used to provide flexible, fully archetypable structures in these areas. Simplifications to openEHR's current RM should aid the development of a single model.

TBC

The following are specific resources for use in defining a new 13606-1 model.

Node identification and 'rc_id'

The primary requirement to safely identify information shared between systems is for a guaranteed globally unique identifier for COMPOSITIONs. It would be reasonable to require a GUID for all COMPOSITIONs. This provides a guarantee that any sharing party can agree on which instance (e.g. original or copy) of a COMPOSITION it has. openEHR does this, and it would therefore be easy to state the same rule for 13606 and openEHR.

For identifying nodes lower in the hierarchy: it would not be unreasonable to require GUIDs on the root of any ENTRY instance (in openEHR-speak, an OBSERVATION, ADMIN_ENTRY, ACTION, INSTRUCTION or EVALUATION). This would greatly ease the definition of LINKs that want to point to other ENTRYs, and may also make query return results less ambiguous, assuming that ENTRYs are the smallest unit of returned data across enterprise boundaries (see below for more on this).

The question then is how to identify nodes below COMPOSITION and ENTRY (other than the ENTRY root nodes themselves). To understand this question properly, we need to understand the reasons why node-level identification is thought to be needed. These are:

  • to enable LINKs to point to anything (in 13606, LINKs use rc_ids as references);
  • to enable two systems to unambiguous share a reference to any item of data;
  • to enable references to fine-grained elements for the purpose of version and attestation representation, if differential (aka 'delta) representation is required.
    • NB: openEHR does not specify differential version encoding, and a recommendation for simplifying 13606 would be to get rid of this feature.

Some of the above will need to be reviewed. However we can probably assume at least some remaining need to refer to the finest grain of information. We can assume that every type of feeder system has its own method of doing this. All openEHR systems identify information nodes via archetype paths, in some cases with further predicates if there are multiples. Any system that is 13606- or openEHR-enabled, and using archetypes can do the same.

The difficulty for 13606 is that it can't assume all users of its specification actually use archetypes. Therefore, it may make sense for the new 13606 revision to state the following:

  • for non-archetyped use, GUIDs need to be generated for each interior node in a message + rules on how these are to be managed;
  • for archetyped 13606 content, the openEHR-style archetype path-based referencing can be used.

Query result representation

Issues:

  • across enterprise boundaries, potentially require no finer grain than Entry?
  • what structures? See e.g. openEHR spec

ISO 13606 part 2 - Archetype Specification

Current status and issues

The current form of ISO 13606 part 2 is from a snapshot of the openEHR ADL and AOM 1.4 specifications donated by openEHR to CEN in 2007. This was re-formatted into CEN and then later ISO document format. Extensive experience in the openEHR and 13606 communities has now been gained with this specification. It has been deployed both for openEHR archetypes in a number of national e-health programmes, as well as in production HIS systems around the world. Its limitations are now fairly well understood, and substantially addressed in the openEHR ADL/AOM 1.5 specifications, now in late draft (see link below).

Resources for improvement

Potential New Sections

Archetype-based query language

Currently there is no query language in ISO 13606. However, a query language and methodology is probably the most critical part of the archetype-based framework relating to use and re-use of data. The openEHR Archetype Query Language (AQL) has been in production use for over 4 years, is implemented in C#, Java and Ruby, and is relatively stable and mature.

A related specification called a-path - an archetype form of Xpath - is also expected to be useful and may be integrated into ADL 1.5 and AQL eventually.

Terminology Issues

TBC