Multi-value Leaf Elements

Overview

In a number of situations it has been found that leaf level values, carried in the Element.value attribute in the openEHR model, need to allow for multiple items rather than just a single one. One example is for questions in a questionnaire, illustrated by the following (non-medical) example from Alessandro Torrisi, Medical Centre Alkmaar (MCA), Netherlands:

In real life :

Q: What do you like to eat?
A: Pizza, Pasta

In openEhr:

Q: What do you like to eat?
A : Pizza
Q: What do you like to eat?
A: Pasta.

I did not asked the question twice, so it is not correct this way. Beside that, there is another problem :
In openEhr:

Q: What do you like to eat?
A : Pizza
Q: What do you like to eat?
A: Pizza
Q: What do you like to eat?
A: Pizza

The need here is to be able to instead support the following:

Q: What do you like to eat?
A: Pizza, Pasta, Fish

It turns out that there are a number of circumstances where the same problem turns up. Currently, although this kind of data can be represented, the openEHR RM does not directly model multiple-values on a single data node, nor is there a clear guideline for archetyping it. Below, we describe these requirements in detail and consider possible solutions, including how to do it in the current model and archetyping infrastructure.

The Problem

Text / coded responses in questionnaires

A clinical version of the 'pizza' problem above is questions like the following in questionnaires, such as given to a new patient at a GP clinic:

Q: list the substances to which you have known allergies
Q: list foods to which you have intolerances

The typical answer to such questions are things like:

A: pollen, grass, penicillin
A: shellfish, peanuts

In these and many other cases, the answers are often a) multiple and b) unique, i.e. items are not repeated. In the above cases, the response values might or might not be coded.

In professional questionnaires, such as http://www.cap.org/apps/docs/committees/cancer/cancer_protocols/2005/breast05_ckw.pdf#page=5 there are questions that could easily have multiple answers, and the answers may be coded from Snomed-CT or elsewhere.

Date/Time Data

A similar thing can occur with dates, e.g. Date of prescription 12/12/2008, 1/4/2009, 23/5/2009. The same thing is true here - multiplicity and uniqueness apply.

The Current Solution

The openEHR Release 1.0.2 RM does not directly provide a way to represent the above kind of data. In this release, data items are always atomic DATA_VALUE descendant objects, attached to the ELEMENT.value attribute in a data hierarchy. Representing multiple items is normally done with a CLUSTER or ITEM_LIST, e.g.

CLUSTER; name = 'allergies question'
- items =
  - ELEMENT; name = 'question'; value = 'what allergies do you have?'
  - CLUSTER; name = 'responses'
    - items =
      - ELEMENT; name = 'response(1)'; value = 'grass'
      - ELEMENT; name = 'response(2)'; value = 'pollen'
      - ELEMENT; name = 'response(3)'; value = 'penicillin'

This structure is a reasonable representation of the requirement. The problem is how to control it properly within an archetype. Consider the following archetype fragment:

CLUSTER[at0004] matches {    -- allergies question
    items cardinality matches {;unique} matches {
        ELEMENT[at0005] matches { -- question
            value matches {"what allergies do you have"} -- would normally be coded
        }
        CLUSTER[at0006] matches {            -- response
            items cardinality matches {;unique} matches {
                 DV_TEXT matches \{*\}
                 DV_CODED_TEXT occurrences matches \{*\} matches {
                      defining_code matches {
                           [local::at0010, -- grass
                            at0011,          -- pollen
                            at0012]          -- penicillin
                      }
                 }
            }
        }
    }
}

The above archetype does what is needed. However, there are some anomalies to consider:

each answer is path-addressable separately, which is probably not the design intention;
to map this data structure to a GUI control, e.g. in a form code-generator, will not be easy, because it can't be distinguished from the situation of N separate questions, each tagged with 'Response 1', 'Response 2' etc and a separate edit field in the GUI. However, what is most likely wanted is a single 'Response' field with a multi-select drop-down, in the manner of some email client address select controls.
Archetype-building tools would need to implement a 'multi-answer question node' and generate the above pattern when it was selected by the archetype author.

Solutions

Solution #1 - Implicit data to GUI mapping (no RM change)

The default approach is to make no changes to the RM or archetype model, and to require the above archetype pattern to always be used in this circumstance. This solution does not address the GUI mapping problem described in the section above.

Solution #2 - Specific multi-value data types

A data type such as DV_MULTI_TEXT could be added to the Data_value package in openEHR, whose value was a List<DV_TEXT>. To deal with DATE, we would need DV_MULTI_DATE, and so on.

Pros:

superficially this solution is simple.
no impact on existing data

Cons:

This option seems clumsy, since it is likely that it will spread to all current atomic data value types.
it is not clear whether all data types should be given a DV_MULTI_XX version or not - if we only provide the ones we know about now, we will be adding new ones for quite some time.
the grouping logic of each multi data type might end up being specific to each type, reducing the possibility of code re-use across all such types, particularly in the GUI and query engine.

Solution #3 - New Multi-value Element type

A more generic approach is to change the Data_structure.Representation package to add a new MULTI_ELEMENT class, whose value is a List<DATA_VALUE>. To make things work properly, a new parent of ELEMENT and MULTI_ELEMENT would have to be inserted as well, say ELEMENT_ITEM, so that the types ITEM_LIST.items and ITEM_SINGLE.item could be respecified to be this new type (thus allowing both kinds of ELEMENT).

Pros:

This is a generic solution and would make it clear in tools and data that the structure was Question + multi-item Response. It would enable GUI form generation to work.
This is unlikely to affect many archetypes, since in places were ELEMENT occurs (almost always under CLUSTER.items), the child types 'ELEMENT' and 'CLUSTER' are explicit in the archetype. Where MULTI_ELEMENT is needed it would have to be explicitly added in an archetype under CLUSTER.items.
No impact on existing data.

Cons:

This would require changes to the existing package. It could be done in such a way as to not change the semantics of the current model and preserve existing data, but it would cause changes to all software implementations of this part of the model.
in XML, it would require a new class MultiElement (easy) and new pseudo-classes for the types List<DATA_VALUE> and all possible descendants, i.e. List<DV_TEXT> etc. But... that's life in XML-schema.
query engines would have to be slightly modified to ensure a data value of X within a MULTI_ELEMENT was also matched along with an X within a normal ELEMENT in a result for a query for positive matches for X.
it would also diverge from the ISO 13606 model of the same thing. (However, we should not be preventing progress in openEHR because of the static nature of existing official standards).

Solution #4 - New Container Data type

An alternative generic approach is to add a new generic data value class DV_LIST<T: DATA_VALUE> which would allow the creation of types like DV_LIST<DV_TEXT>, DV_LIST<DV_DATE> and so on.

Pros:

This is a generic solution and would make it clear in tools and data that the structure was Question + multi-item Response. It would enable GUI form generation to work.
This approach would have the advantage that it is a pure addition to the model, and requires no changes to existing software, only additions.

Cons:

This requires a change to the RM, to insert some new abstract classes between DATA_VALUE and the existing types, such as: DV_ATOM and DV_CONTAINER, with DV_LIST inheriting from the latter. This is needed so that DV_LIST.items can be specified to be List<DV_ATOM> (leaving out the 2 intermediate classes would allow DV_LIST>DV_LIST<T>>, which we do not want).
MAJOR: A lot of archetypes that currently assume ELEMENT.value is an atomic DATA_VALUE type would need to be changed to constrain it to DV_ATOM. This might affect a large number of archetypes.
query engines would have to be modified to handle DV_LIST<T> similarly to solution 3 above.
Because XML schema cannot properly support generic types, the schemas would require kludge classes like DvListDvText etc, for each possible type DV_X that might have a DV_LIST<DV_X> counterpart. More or less as for solution 3.

Discussion

From the above, it would appear that Solution #3 above is the best. This seems to best represent the design intention corresponding to the requirement, and while it would have some impact on software, it will have no impact on existing data. It would require changes to the reference model, the XML schema, and existing query engines, but none of these changes appears to be difficult or to create problems with existing data.

Note that in solutions 2, 3, and 4 above, the semantics of paths change slightly: now a leaf-level path can no longer be assumed to point to an atom, it might point to a list. Existing queries would need to be reviewed, and new ones written in a slightly different way, such that matching a value of X must be done by testing for either equality e.g. x/y/z/value = 'pizza', and set membership e.g. x/y/z/value.has('pizza'). Solving this might be best done with a new operator say ~= which tests for both, e.g. x/y/z/value ~= 'pizza'.

A related Issue - choice semantics in single attributes

The semantics of 'select' are essentially a flexible choice concept, in the general case, M from N, where M <= N. It initially seemed that 'select' would be a '1 from N' concept, in which case there was an idea that it might be used to make the choice semantics of single attributes with multiple children clearer, but this is not the case due to the more general 'N from M' meaning.{*}