Ref Sets

Overview

One of the main ways terminology is experienced by the user is in the form of 'ref sets', i.e. intentional reference subsets. These are structured subsets based on a larger terminology (such as ICD10 or SNOMED CT) and consist of a set of values that are sensible potential answers to a particular question, such as 'kind of lung infection', 'blood group', or 'prosthesis type'. The 'ref sets' are called 'intentional' when they are defined by a query or expression which has to be evaluated against the terminology to produce the actual subset (such as everything that 'IS_A' infection). This is in contrast to an 'extensional' subset which is a 'hard-wired' list of terminology codes. Intentional ref sets are more powerful because they can be re-evaluated to take into account changes to the base terminology, and also because the result is (or should) retain structure from the original terminology, such as IS-A and other relationships.

Ref sets are a key resource in openEHR. They are designed in concert with archetypes, to provide answers to particular data fields in what are sometimes called '2nd order' archetypes, i.e. archetypes whose meaning is given by one of the fields, such as 'index diagnosis' - in general, it will be a code field whose value comes from a ref set.

There are accordingly some key aspects of the relationship between archetypes / templates and ref sets.

Ref set definition language (aka Terminology query language)

Intentional ref sets, of which static or extensional ref sets are a special case, are defined by a specialised query language that includes operators for selection, exclusion, comparison, as well as some means of influencing the structure of the result. There is currently no standard language for this, although there are activities with both IHTSDO and OMG/HL7 (CTS2) to create such a language.

Currently there are ref set languages included in a number of terminology products, and also the NHS edition of the IHTSDO workbench.

Languages whose definition are currently openly available include the following:

Design of ref sets

The design of intentional ref sets will be carried out by clinicians and other experts in response to data points found in archetypes, or more frequently templates. We expect that many ref sets will be for local use and so it makes sense that these relationships are defined within templates. There is still the need to know which ref set provides the value set for which node or nodes in a template or archetype.

Reference to ref sets

A technical means is required to formally express archetype and template references to ref sets. This is currently defined to be via a URI within the archetype or template which is sensible as it means they can be recovered over the web. However, in reality the terminology service actually in use will be queried, possibly with the URI as a key.

Since ref sets are defined based on a terminology (or other ref sets), they need to be managed with respect to the terminology as it changes, and also due to changes made to the ref set definitions themselves (e.g. due to improved understanding, error corrections and so on). Ref sets therefore require:

  • a means of identification;
  • a means of storage and access providing availability to both human user application (e.g. browsers) and production systems;
  • a defined approach to governance, versioning (of the query statement and the base terminology) and quality control.

These topics are described in more detail in this page and its children.