Coded text fields - Coded With Extensions or Exceptions (CWE) in HL7?

The question of how to specify a coded field in an archetype, particularly with respect to optionality is often more complex than it appears. In particular, how does one state the idea: this field is preferably coded, but if not, it is text?

[The discussion here came out of the above question being raised by Stephen Royce, Clinical Information Lead, Nehta,]

Technical Possibilities

In openEHR, the technical possibilities are as follows in ADL (and equivalent XML and in-memory AOM structures):

  • DV_TEXT only; don't say anything about coding at design time, but nothing to stop it (or guide it) at runtime
  • DV_CODED_TEXT only: must be coded
  • DV_TEXT and DV_CODED_TEXT as alternatives, which means: this is a text field, which is preferably coded, and if coded, the coding follows the DV_CODED_TEXT constraint

The last of these is documented in section 5.3.5.1 and 10.5.3 of the ADL 1.5 draft specification; example:

name matches {
     DV_CODED_TEXT matches {
         defining_code matches {[ac0001]} -- internal code of ref-set, e.g. from SNOMED CT
     }
     DV_TEXT matches {
         value matches {/.+/} -- non-empty string
     }
 }

The above means: if the data item is coded (i.e. the instance is a DV_CODED_TEXT), it MUST conform to the DV_CODED_TEXT constraint above, else it is plain text.

Real-world Requirements

Content

The openEHR DV_TEXT is always codeable and so covers the system requirement that is posed by CWE. The 'canned' responses by pathologists with codes in text fields are an example of when it is difficult to say something could never be coded. Pure narrative text is often richer than just a sting these days; it will usually be able to be formatted in some way (xhtml, html, rtf etc).

So I would take the argument a little further and say systems need four categories of text:

  1. Formatted text/MIME type for long and formatted narrative (much as email is now either text or HTML) DV_PARSABLE in openEHR or even DV_MULTIMEDIA if you want to mix inline images with other data.
  2. Text which is narrative but has a classification associated with it such as a reason for encounter that has an ICPC code
  3. Text which is a can be succinctly labelled and captured which can be coded (derived directly from the terminology) or narrative – CWE/DV_TEXT
  4. Text which is only derived from the terminology.

You have raised the issue of the last three. I think we will have a debate about the requirement for text to be derived from the terminology or not. There is good evidence that if you do not enforce this that many errors creep into the data, although classifying (option 2) is usually safe.

You raise the issue of wanting to put terminology as an optional binding: there is usually two ways to do this – with a 'limit to list' idea or a fixed terminology with 'other' and a field to carry the text. The CWE and DV_TEXT allow the former.

The need to bind the terminology when it is considered optional is strictly covered in openEHR as a constraint by having a choice between DV_TEXT or DV_CODED_TEXT with a binding to terminology on the DV_CODED_TEXT option. This is only sensible at the level of the archetype, in my opinion, if the only terminology that is permitted is known and will be true for a substantial period of time. A change of the terminology constraint will introduce a breaking change in validity of data in the future with sweeping consequences. Everyone then needs to update their version of the archetype to retain compliance and backward compatibility is lost. This is a very important consideration in validation environments as archetypes need to be stable for  long periods of time to ensure systems can cope with the requirements. It is for this reason that binding to terminology has been considered most appropriate in templates rather than archetypes. At the Fresh Look meeting in Orlando this was raised as an issue by many people when Stan Huff said he wanted to bind terminology at the model level.

Having said that, it would be easy to present the optional DV_CODED_TEXT with binding/DV_TEXT to the user as a single choice (a la CWE), if considered necessary. This would offer the three levels as choices and allow binding at the archetype level. I would counsel that, while this may appear attractive from a modelling point of view, does carry the issue of backward compatibility at the level of archetypes into a space that may prove difficult.

We are all seeking to achieve interoperability using models as the foundation. I am certain that our efforts need to be at two levels – the solid and concrete agreement that we can stick with for a period of time that outlives many software iterations (archetypes) and the current agreed best practice for system interoperability (templates). Optional codes will rarely achieve useful interoperability gains unless the codes are already used widely in existing systems.

Systems interoperability

It is worth considering what the possible scenarios are in various views.

There are six possible states for any system sharing information with other systems involving a text element:

  1. The system uses only coded text and the terminology is the one used in the shared space
  2. The system uses only coded text and the terminology is NOT the one used in the shared space
  3. The system uses coded text optionally and the terminology is the one used in the shared space
  4. The system uses coded text optionally and the terminology is NOT the one used in the shared space
  5. The system uses text and classifies the response using the terminology that is the one used in the shared space
  6. The system uses text and classifies the response using the terminology that is NOT the one used in the shared space

Now, let's consider which systems can deal with the information being shared with a CWE. For each system (1-6 above)

  1. This system has to either code the information as it comes in (reasonable for problem lists/ medication lists/allergies but not for all information. It is a lot of work when the coding rates are low. Natural language processing may assist.
  2. This system has to code everything again but can benefit from mapping tables. This is a lot of work regardless of the coding rates.
  3. This system will find it easy to comply but data is likely to be unusable inside the system for key decision support.
  4. This system will have the worst of situations as it will rely on mapping tables and user coding and it could get worse.
  5. This system can allow codes to accumulate and code with specific terminologies when appropriate. Outgoing can enter the approved terminology if it is available.
  6. Similar to system 5 but will be able to process less data.

My point here is that it is not an easy situation for anyone until systems align. And consider what happens when we get to the ICD-9 to ICD-10 environment that is now taking place in the USA. You cannot shift everyone at the same time.

This begs the question, if coding is optional why would you do it? Generally this will be decided by vendors rather than individuals – except where the list of codes with 'other' option is present. The 'other' option is more likely to mean that the user has considered the list but decided that the option is not available rather than just typed something that did not return a suitable value from a terminology list. This approach is usually only suitable for small lists because with large sets (like SNOMED) it is  very difficult in practice to say that something is not there, but it can also be a problem to find what you are looking for. If it is possible to enter non-coded data then people will do it more often because they can't find the appropriate response than it not being there.  

Within a system this is may be a safe thing to do as it is clear what the dependencies are to decision support etc.. Optional coding is a much more complex situation when you are sharing information as I hope my analysis above demonstrates.

I think that the right approach is to insist on coding for key data – Allergies and Medications in the first instance. Make it possible for patients or clinicians to enter allergy/adverse reaction data directly from terminology ref sets if systems are unable to do it. Medication sharing can be coded through appropriate extensions to systems. It will take time but it is probably more problematic to send uncoded medications in the medium term.

Looking ahead the solution appears obvious to me, but it does involve some complexity. The real value of the approach of using (code) 'translations' or openEHR 'mappings' is that it goes on working when terminology requirements change in the future. That is that CD is allowed (with the CWE constraint if appropriate) and allow coded texts with terminologies that people have in their systems, adding a SNOMED mapping where available (or ICD-10, LOINC or whatever else is deemed the lingua franca) as a translation. Wait until the SNOMED presence reaches a certain level and then enforce SNOMED coding 3 years hence. Add as many mapping services to SNOMED as possible.

A real benefit to this approach is that commonly used codes such as Medical Director codes – can be used in other medical director systems even if they do not have or use SNOMED. Likewise for ICPC.

After re-reading the ISO data types it raises a couple of issues. The only type is CD and CWE (with extensions) is described as a constraint on the CD type. CWE is actually designed to allow local extensions (ie local terminologies). So if we have a CD(CWE) with a SNOMED binding it will mean that you can have SNOMED coded data, just text, or coded text using other terminologies. Is this your intent?  Just as the CWE is a complex constraint on CD it is also clear that binding a CD to SNOMED and using CWE does give some indication as to what is recommended for transmission. The openEHR option of a CODED_TEXT, with a SNOMED binding, or a TEXT allows the same flexibility. However, using the template to express the CWE appears preferable to me unless we are really sure what the future holds.

The terminology ecosystem will be complex for many years to come, potentially forever. It rather depends on how sensible we are and our vision for a future evolving system. Until models and their attendant terminology subsets are widely adopted we will be living in a world where text rules. We will learn a lot in the next 12 months.