Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Codes have been around since before the days of computers, but all digital computers must rely on codes for their very operation. Each instruction is represented by a combination of 0s and 1s. So too is every piece of data. With respect to data, codes can be applied at different structural levels. Thus, each character is represented by a code or codes, from the simple 127 common ASCII characters used to write this article, to the complex kanji, chinese and other characters and symbols that are represented by more complex coding schemes such as UNICODE. So, by combining character codes, we can represent and store words and phrases  - i.e. text strings. The string "openEHR" is built from the ASCII codes:

Code Block
  01101111  01110000  01100101  01101101  01000101  01001000  01010010

In the early days of computing, memory, storage space and communications bandwidth were very limited. It made sense to not only code individual characters within a string, but even to code text strings themselves, by replacing the set of codes representing each character of the string by a single code representing the entire text string. Particularly so if the string was likely to be repeated in other locations. "Diabetes Mellitus" could be replaced by code "1101110011010001", for example. Or even by just a shorter string, say "DM". It saved precious space and bandwidth. Codes were easier for computers to identify in searches and to place into predefined message structures.
But most of these barriers have evaporated over the years, as computer storage first increased a thousandfold, then a millionfold and onwards. The bandwidth of many of our network links have scaled by several orders of magnitude each decade. Now our programming languages and programmers can support sophisticated pattern matching through "regular expressions" and other advances which allow them to operate directly on text strings instead of codes. Still, the legacy of those old constraints live on in the specifications and the mindset of many current authors and standards development bodies.
The above history tells only part of the story. Surely there are other factors influencing the scene when it comes to representing clinical terms and data through the use of codes?

...

As medical knowledge expands and evolves, so to do the concepts and the language used to describe concepts, change. Sometimes, the concept morphs but the term used remains. Sometimes the concept remains the same but an alternative term is used. Sometimes the two occur simultaneously. One old concept is cleaved into two or more new concepts and each new concept given a new term. The original concept may even fade out of our daily lexicon. It is desirable, particularly in a longitudinal health record spanning many decades for the current viewers and users of the record to somehow be able to make sense of the concepts and language of yesteryear. But the terminology has to be designed and managed well, for these purposes, the electronic health architecture needs to support this, and the current (at the time of viewing) implementations could need access to the prior state of the terminology at the time the entries were made.
Concept permanence is probably of even greater importance for statistical comparisons and research analyses that span long periods of time and patient cohorts. One only has to trace the history of the classification of, say, the various manifestations of hepatitis through successive versions of The International Statistical Classification of Diseases and Related Health Problems (most recent release is ICD-10), to appreciate the complexities and interaction of changing concepts and changing codes.

...

Other codesets are larger, but still often only flat lists of terms and corresponding codes.
HL7 v3, for example, has some 250  'supported vocabularies',  about half of which are managed by HL7, and half managed by organisations external to HL7. Even of those mostly supposedly flat codesets internal to HL7, many are not stable from release to release in any sense, have contradictory definitions in different places, and have a plethora of different code forms and inconsistent information. Some codesets have a specialisation hierarchy implicit in the set. Some codesets have a specialisation hierarchy encoded into their codes. Some have a combination of both approaches. If HL7 International cannot manage their own codesets consistently and effectively, then how can systems trying to parse incoming HL7-based messages ever be expected to cope?
For many of these codesets, it is left to local implementers, national standards bodies, vendors and possibly even clinicians and others to decide if the codeset is appropriate for their scope of implementation. If not, then they must decide to either replace the set, modify it, or augment it with the codes and corresponding terms peculiar to their scope. The ongoing synchronisation often becomes an impossible treadmill of reaction  to change, well beyond the control of the clinicians and clinical institutions trying to provide health care based on such an ad hoc approach.
A small number of large, well designed  terminologies offer much for decision support. However, terminologies of this ilk, such as SNOMED CT are much more complex than simple  flat codesets. One single release of SNOMED CT has millions of codes, pointing to concepts, terms, relationships. It's codes form a multipurpose polyhierarchy of concepts and terms, with multiple relationship types. It has mechanisms for extension, multi-language translations and subsetting for defined purposes. It's compositional grammar, as already mentioned, allows for terminological expressions to be constructed as needed, based on the concepts available. Even without SNOMED CT's significant and documented problems, implementing and harnessing all this power in any one real system is a profound challenge. Deploying it broadly and effectively across a range of systems to aid semantic interoperability is taking the challenge to even greater heights.

...

As a general principle, codes are for computers not humans. Codes should work behind the scenes and not be exposed to users, particularly busy clinicians. They should not be deliberately exposed, unless absolutely necessary, to those who are only peripherally likely to understand their meaning, such as software developers, or data modellers. Writing standards and specifications for humans, that are littered with abbreviations and codes often dreamed up on a whim, that have to be understood, transcribed, embedded in program code, put into test scripts and test specifications and otherwise discussed and manipulated, and above all remembered, is fraught with danger. It is not sound engineering practice. It dramatically narrows the pool of experts who can understand and use the specifications, and risks misunderstanding and transcription errors and the resultant clinical errors that can ensue.

The above notwithstanding, there are places where codes and humans legitimately need to meet. These situations are where textural descriptions are too awkward to use. Common examples in daily life are things like postal codes and bus codes. It is far simpler to refer to postcode "5068", than "that area bounded by the Sunnybank River to the North, Franklin Bridge, Rainsford Rd and Elm St. to the east, holes 6-14, 17 and 18 of the Royal Plunkett Golf Course to the South, and ....", or to bus "J1E" instead of  " the bus that departs from the corner of Edmund St. Walkerville and ...".
In health IT, examples might be genes and gene sequences, tumour staging, or the classification of diseases. In these circumstances, it is often easier for the humans involved to refer to these concepts by codes. Where codes are to be used by humans, it is sensible for the codes to carry additional meaning or representational hints to aid the humans disambiguate the codes and reduce the chance of error during human processing and transcription. Thus in the bus code "J1E" , the 'E' might denote express. Similarly, where codes are to be used by humans, the shorter the code the less likelihood for error. In Australian hospitals, it is common practice for a patient's identity to be verbally cross-checked by nurses prior to procedures, including administration of some drugs. This cross check usually uses the hospital's own Unit Record Number for the patient, which usually has few digits and so is relatively human-friendly.

It is probably the history of human abbreviations in the early days of coding and messaging that has lead to the proliferation of a vast array of semi-interpretable "codes" creeping into what should only be computer-processable identifiers of many codesets. Even in the most recent versions of HL7, these are variously and conflictingly referred to as "mnemonics", "codes", "conceptIds".

...

When small termsets such as gender are used within a given language realm, what possible gain is there by replacing the value "male" by a code such as "1", or "M"? There certainly is plenty to lose!! Why should every information system that receives such a code have to deal with this? Humans can understand "male" easily. Computers can process "male" easily. Humans cannot understand the code "1" in any meaningful way!. Computers cannot process "1" in any meaningful way, other than perhaps saying that "1" (male) is less than "2" (female)! This may be true in one sense, but is this the intention of the sender of such coded data - to obfuscate and compromise patient safety? The code is absolutely useless without access to the accompanying meaning - e.g via some code table. Who can guarantee that that access will always be available? Why place such a burden on every clinical system needing to process gender for absolutely no benefit.? It is far more important to give the clinicians definitional information about the meaning of appropriate terms in the particular context of the data field. Does this refer to administrative or physical gender?

The more small codesets that information systems have to deal with, where disambiguation of multiple-meaning terms is not required, the less likely we will have of achieving a reasonable level of useful information exchange. We should not be blindly advocating that all data be coded. We should stop and think of the ramifications of such recommendations.
One ramification is that we are forced to build code maps between many different standards and coding systems in order to meet the coding requirements demanded by each system. There is no longer oportunity for consideration being given to the importance of insisting on an appropriate code for each data item. Instead, software developers and implementers and message "integrators" are left trying to force square pegs into round holes. Continuing the simple gender example above, we have many examples such as the following internet discussion forum snippet:

...

In the above example, one keen clinician wanting to send recall notices to patients deemed to be candidates for Prostate Specific Antigen (PSA) tests,  has delved into the bowels of his patient records, determined the relevant database tables, determined that LOINC has been used to code test names, determined the LOINC code (from some 40,000+ codes) historically used by the particular pathology lab in their HL7 message for the PSA test, determined how gender is coded in this specific clinical system, built and run the requisite SQL query; and hopes that nothing changes next time the query is run!  A great piece of detective work, but clearly not an acceptable nor sustainable way to empower clinicians with usable, semantically interoperable electronic health records that meet their requirements.

...

  • linking meaning in clinical guidelines to meaning in data. Guidelines need to be written by humans, yet processable by computer. If they end up in a coded form in a computer, we must have the tools to reverse the coded form of the guideline for clinical use. Because of the patient specific context required for the application to a specific patient, is there any realistic option to linking guidelines to data other than through openEHR openEHR archetypes? The links cannot simply be done through coded terminology.
  • similarly for assisting the classification of patient data for research and reporting using ICD-10 or similar classification systems. The linking of contextual patient data means that codesets and mapping tables are insufficient.
  • Gaining better understanding on the value of storing information pertaining to real entities and events ( an ontology of reality perspective ) vs storing codes for concepts in patient records. (an ontology of use perspective ). Barry Smith and others [CEU2006] have argued that we should uniquely identify ( in order to refer to ) each real instance of a bone fracture (for example) of each patient, rather than some generic concept of a bone fracture, thus allowing us to track and disambiguate bone fracture instances.
  • human interfaces - the way in which text data representing clinical context is captured and displayed is an area needing far more research. How we translate from concepts to terms, from codes to words and vice versa in every system interface is critical to ensuring clinical safety. We need consistent, coherent, repeatable, reliable solutions to these, not a miss-mash of a myriad different approaches, constrained by the nuances of individual coding schemes and vendor architectures. The UK's National Health Service project on Common User Interface is a good start in this direction.

...