Archetype Group Construct

Introduction

Over the last few years of using archetypes and templates in various contexts, including in Australia, the UK NHS, and various provider sites, some subtle problems have emerged, many to do with items within container (i.e. multiply-valued) attributes in the openEHR Reference Model (RM). Most of the attributes in the openEHR RM are of this type, in common with other models used in health informatics (i.e. in a general sense, most such models are some kind of tree structure). Another area of possible ambiguity is the 'slot' concept, which it appears has not been clearly enough defined. This page aims to elucidate the requirements, current difficulties, workarounds, and possible long-term solutions.

Thanks to the numerous people from the community, including people working within the UK NHS and Dutch modelling efforts. It has taken quite some time to gather sufficiently clear explanations of numerous problems and requirements so as to create something resembling a comprehensive analysis.

Choice in Single Attributes

Requirement

The first requirement we consider is actually the notion of 'alternatives' within single-valued attributes, which is already built in to the Archetype Definition Language (ADL). This says that at any single-valued attribute, multiple object constraints can be stated, with the meaning that each such constraint is an alternative, i.e. the data at runtime can be matched to any on of the constraints. Here is the example from section 5.3.2 of the ADL specification:

items cardinality matches \{*\} matches
    ELEMENT[at0004] matches {                           -- speed limit
        value matches {
            DV_QUANTITY[at0022] matches {                -- miles per hour
                magnitude matches {|0..55|}
                property matches {"velocity"}
                units matches {"mph"}                           
            }
            DV_QUANTITY[at0023] matches {                -- km per hour
                magnitude matches {|0..100|}
                property matches {"velocity"}
                units matches {"km/h"}
            }
        }
    }
}

Here we have two alternative constraints on a DV_QUANTITY object that can occur as the value of ELEMENT.value somewhere within a larger structure. Note that both branches have an at-code, as required by the ADL 1.5 identification rules, which say that multiple siblings of a single-valued attribute, if of the same reference model type, must carry a node identifier, permitting data nodes to be associated with the correct archetype nodes. A more interesting example of this feature might be multiple alternatives for the protocol attribute of one of the Entry subtypes.

So far, this requirement appears to correspond to a current need.

Tooling

The current archetype tools in use are less clear with respect to single-valued attribute alternatives. The Ocean Archetype Editor (AE) supports the capability via a 'choice' node, but because it does not work the same way under multiple-valued attributes, it can be confusing for users.

In LinkEHR archetype editor 'choice' nodes are automatically created when an alternative over a single attribute is created, and removed if the alternative is removed.

Choice in Container Attributes

Requirements

When it comes to mutliply-valued attributes, things are more complicated. ADL provides the ability to define multiple members of a multiply-valued attribute, each defining a possible constraint that could be used to control data. Because we are talking about a container attribute, data could be created conforming to more than one, and possibly all, of the object children of such an attribute defined in an archetype. Consider for example the following:

    EVALUATION[at0000] matches {    -- Problem
        data matches {
            ITEM_TREE[at0001] matches {    -- structure
                items cardinality matches {0..*; ordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {    -- Problem
                        value matches {
                            DV_TEXT matches \{*\}
                        }
                    }
                    ELEMENT[at0003] occurrences matches {0..1} matches {    -- Date of initial onset
                        value matches {
                            DV_DATE matches {
                                value matches {yyyy-??-??}
                            }
                        }
                    }
                    ELEMENT[at0004] occurrences matches {0..1} matches {...}    -- Age at initial onset
                    ELEMENT[at0005] occurrences matches {0..1} matches {...}    -- Severity
                    ELEMENT[at0009] occurrences matches {0..1} matches {...}    -- Clinical description
                    ELEMENT[at0010] occurrences matches {0..1} matches {...}    -- Date clinically recognised
                    CLUSTER[at0011] occurrences matches {0..*} matches {...}    -- Location


                    -- etc
                }
          }
     }

In this example, there are multiple child objects defined for the attribute items of the ITEM_TREE object. Some have different occurrences set. Occurrences defines the number of times in data an object can appear that matches a given object constraint in the archetype. So data the conform to the above must consist of an ELEMENT node conforming to the [at0002] constraint, and optionally single ELEMENTs conforming to the following nodes, up until the CLUSTER [at0011] block, for which any number of instances may exist in the data. The items attribute is marked as ordered, so it is assumed that the ordering of the archetype object constraints will be respected in the data.

As far as it goes, this is fine, and serves the purposes of many archetypes, where the requirement is simply to state the possible node types in a container. However some other needs have arisen over the course of the last couple of years' modelling.

Choice on a node in a container

The first is the need to be able to define an archetype so that one (or it may be more) of the items in some container consist of a choice of distinct items. For example, if a list of 8 object constraints in an ITEM_TREE.items list like the one were defined, it may be that for item 4 we actually want to define more than one possibility, only one of which can be chosen. Consider the following structure of data relating to a clinical investigation:

  • ITEM_TREE
    • items
      • investigation type: ELEMENT
      • reason for investigation: ELEMENT
      • methodology: choice of (ELEMENT -- reported as a DV_TEXT ; ELEMENT -- reported using a DV_CODED_TEXT ; CLUSTER -- reported in a structured way )
      • some other details: ELEMENT
      • some other details: CLUSTER

Here, the 3rd item is used to report the Methodology, but the requirement is to be able to constrain it to be in one of three forms, a DV_TEXT, a DV_CODED_TEXT or a structure. In ADL 1.4, two approaches can be used to model this.

1. Define an invariant

In this approach, the following kind of archetype structure would be defined:

definition

     EVALUATION[at0000] matches {    -- Investigation report
        data matches {
            ITEM_TREE[at0001] matches {    -- Tree
                items cardinality matches {0..*; unordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
                    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason
                    ELEMENT[at0004] occurrences matches {0..1} matches {    -- Methodology as a Text description
                        value matches {
                            DV_TEXT matches \{*\}
                        }
                    }
                    ELEMENT[at0005] occurrences matches {0..1} matches {    -- Methodology as a Coded list
                        value matches {
                            DV_CODED_TEXT matches {
                                defining_code matches {
                                    [local::
                                    at0020,     -- method A
                                    at0021]    -- method B
                                }
                            }
                        }
                    }
                    CLUSTER[at0006] occurrences matches {0..1} matches {    -- Methodology as a Structured description
                        items cardinality matches {0..*; unordered} matches {
                            allow_archetype ITEM[at0007] occurrences matches {0..*} matches {    -- Item
                                include
                                    archetype_id/value matches {/.*/}
                            }
                            CLUSTER[at0008] occurrences matches {0..1} matches {    -- Details
                                items cardinality matches {0..*; unordered} matches {
                                    ELEMENT[at0009] occurrences matches {0..1} matches {    -- Route
                                        value matches {
                                            DV_TEXT matches \{*\}
                                        }
                                    }
                                    ELEMENT[at0010] occurrences matches {0..1} matches {    -- Site
                                        value matches {
                                            DV_TEXT matches \{*\}
                                        }
                                    }
                                }
                            }
                        }
                    }
                    ELEMENT[at0011] occurrences matches {0..1} matches {...}    -- (some other details)
                    CLUSTER[at0012] occurrences matches {0..1} matches {...}    -- (some other details)
                }
            }
        }
    }
rules
    exists /data/items[at0004] xor exists /data/items[at0005] xor exists /data/items[at0006]
 

Here the definition part of the archetype includes 3 nodes, each for a different way of reporting 'methodology'; each has an optional occurrences. In fact we only want one of these in the data, so we add a constraint in the rules section (this section was called 'invariants' in ADL 1.4) which forces one and only one of these items to exist.

Technically this solution is mathematically correct, but it is arguably not that obvious from the definition above what the real intention is, which is to provide a 'choice' around the 3 methodology nodes.

2. use an extra CLUSTER node for grouping.

This approach has been used by modellers in the NHS and elsewhere, and looks like the following:

     EVALUATION[at0000] matches {    -- Investigation report
        data matches {
            ITEM_TREE[at0001] matches {    -- Tree
                items cardinality matches {0..*; unordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
                    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason

                    CLUSTER [at0004] occurrences matches {1} matches { -- Methodology  -- NOTE NEW CLUSTER NODE
                        items cardinality matches {1} matches { -- NOTE CARDINALITY OF 1
                            ELEMENT[at0005] occurrences matches {0..1} matches {...}    -- as a Text description
                            ELEMENT[at0006] occurrences matches {0..1} matches {...}    -- as a Coded list
                            CLUSTER[at0007] occurrences matches {0..1} matches {...}    -- as a Structured description
                         }
                    }

                    ELEMENT[at0011] occurrences matches {0..1} matches {...}    -- (some other details)
                    CLUSTER[at0012] occurrences matches {0..1} matches {...}    -- (some other details)
                }
            }
        }
    }

This method makes the modelling intention quite clear, but at the cost of introducing an unnecessary CLUSTER object (node [at0004] in the above), whose items attribute is constrained to be 1, i.e. it is not being used as a normal container, but as a placeholder for one of the three sub-items, being the three methodology variants. The problem with this method is that it makes the data include an extra cluster, as shown on the left below, instead of the originally desired flat list, shown on the right:

ITEM_TREE

  • items
    • investigation type: ELEMENT
    • reason for investigation: ELEMENT
    • methodology: CLUSTER
      • methodology: ELEMENT -- reported as a DV_CODED_TEXT
    • some other details: ELEMENT
    • some other details: CLUSTER

ITEM_TREE

  • items
    • investigation type: ELEMENT
    • reason for investigation: ELEMENT
    • methodology: ELEMENT  -- reported using a DV_CODED_TEXT
    • some other details: ELEMENT
    • some other details: CLUSTER

It is potentially debatable whether the left hand structure of data is any worse than the right hand one in terms of viewing / comprehensibility. It should be remembered that the extra CLUSTER has its cardinality set to 0.1, and so it not acting as a proper container. In a theoretical modelling sense, we have introduced a problem with the CLUSTER approach (which can also be effected using SECTIONs at higher levels in the hierarchy) in that now the method for dealing with 'choice' under a multiply-valued attribute is different from that under a single-valued attribute. However, the main thing we need to be aware of with respect to this solution is how it affects code generation, particularly for GUI building. If the extra-CLUSTER method is used, an extra grouping element will appear in any generate GUI, which may not be desirable.

Solutions

We need to keep in mind the difference between the following things:

  • what modellers see in the tools - i.e. how a 'choice' option is visually presented
  • what is in the ADL formalism, which needs to be a reliable transform of the visualisation in the tools, but not necessarily the same
  • what is in data, which has a direct relationship with the archetype structure.

What we need to consider is the tradeoff between complexity in the ADL formalism and AOM specification, and its expressivity. In other words, we could add further features to ADL to more directly support a choice concept in the tooling, but we need to be mindful of how complex ADL (and the AOM) should be made.

The two solutions appear to be:

  • use the extra-CLUSTER approach (extra-SECTION in the level above ENTRY), and modify tools to use this behind the scenes to implement the logical 'choice' concept;
    • optionally a modification could be made to ADL to add a new construct to be used in place of 'cardinality matches {0..1}', such as 'selector' or similar. This would ease the mapping to tools.
    • but otherwise, no changes would be made to ADL. The main changes would be in tooling.
  • add a more complex construct to ADL / AOM that supports a 'choice of subgroup' idea, as shown in the following example.
     EVALUATION[at0000] matches {    -- Investigation report
        data matches {
            ITEM_TREE[at0001] matches {    -- Tree
                items cardinality matches {0..*; unordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
                    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason

                    ITEM[at0005] group cardinality matches {0..1} matches { -- Methodology ************** NEW GROUP CONSTRUCT
                        ELEMENT[at0006] occurrences matches {0..1} matches {...}    -- Methodology as a Text description
                        ELEMENTat0007 occurrences matches {0..1} matches {...}    -- Methodology as a Coded list
                        CLUSTER[at0008] occurrences matches {0..1} matches {...}    -- Methodology as a Structured description
                    }

                    ELEMENT[at0011] occurrences matches {0..1} matches {...}    -- (some other details)
                    CLUSTERat0012 occurrences matches {0..1} matches {...}    -- (some other details)
                }
            }
        }
    }

In the above, a new 'group' construct is suggested, introduced by the ITEM group block. We are doing two things here:

  • firstly, we create an object block that corresponds to the intended single object in the data. If we treat it as an object block, it needs an RM type, and the RM type must be the same as or a super-type of all types intended to be choices within the node.
  • then we include a 'group' clause, which indicates how many of the choices should be allowed in the data. This could be used as follows:
    • group cardinality matches {1} -- one of the choices only can be included in the data
    • group cardinalitymatches {0..1} -- one of the choices, or none
    • group cardinalitymatches {2..3} -- at least 2, no more than 3 choices in the data
  • inside the group block, we include normal node definitions which are to be considered as alternative siblings for this position in the overall container.
  • Note that we have to put at-code node identifiers both on the outer ITEM object, and on all its concrete group-children. This is somewhat subtle. At runtime, one (or perhaps more) of the group-children will be chosen from the group, and its at-code will be found in the data; the [at0005] code will never be used in data or any path.

Now, an alternative, slightly simplified form of the above does away with the object typing, and treats 'group' as a kind of inline constraint statement, e.g.:

     EVALUATION[at0000] matches {    -- Investigation report
        data matches {
            ITEM_TREE[at0001] matches {    -- Tree
                items cardinality matches {0..*; unordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
                    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason

                    group cardinality matches {0..1} matches { -- Methodology ************** NEW GROUP CONSTRUCT
                        ELEMENT[at0006] occurrences matches {0..1} matches {...}    -- Methodology as a Text description
                        ELEMENTat0007 occurrences matches {0..1} matches {...}    -- Methodology as a Coded list
                        CLUSTER[at0008] occurrences matches {0..1} matches {...}    -- Methodology as a Structured description
                    }

                    ELEMENT[at0011] occurrences matches {0..1} matches {...}    -- (some other details)
                    CLUSTERat0012 occurrences matches {0..1} matches {...}    -- (some other details)
                }
            }
        }
    }

This method gets rid of the ITEM[at0005], which is mostly redundant anyway. Reasons we might want to keep it:

  • it provides an at-code that can be used to document the design intention of the group;
  • it provides an archetype node id that can be used to create a path in the normal way which might be useful in tools (but note that some special rules would be required to show that paths like /items[at0005] might refer to any of the objects at /items[at0006], /items[at0007] etc.

So far we can control the membership of a given group. However, another basic need is to allow repetition of the group as a whole. To achieve this, we can add an occurrences constraint, as follows:

     EVALUATION[at0000] matches {    -- Investigation report
        data matches {
            ITEM_TREE[at0001] matches {    -- Tree
                items cardinality matches {0..*; unordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
                    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason

                    group cardinality matches {0..1} occurrences matches {0..*} matches { -- Methodology ************** NEW GROUP CONSTRUCT
                        ELEMENT[at0006] occurrences matches {0..1} matches {...}    -- Methodology as a Text description
                        ELEMENTat0007 occurrences matches {0..1} matches {...}    -- Methodology as a Coded list
                        CLUSTER[at0008] occurrences matches {0..1} matches {...}    -- Methodology as a Structured description
                    }

                    ELEMENT[at0011] occurrences matches {0..1} matches {...}    -- (some other details)
                    CLUSTERat0012 occurrences matches {0..1} matches {...}    -- (some other details)
                }
            }
        }
    }

This now indicates that the group contains at most one element (any of the three members) but can repeat any number of times. This has the same effect as the XML DTD choice concept in the following: a, (b | c)*, d; where the (b | c) part is a choice.

A third question to consider is: do we allow 'group' statements to be nested, as per the following example:

            ITEM_TREE[at0001] matches {    -- Tree
                items cardinality matches {0..*; unordered} matches {
                    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
                    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason

                     group cardinality matches {2} matches { -- pick any 2 of below
                        ELEMENT[at0006] occurrences matches {0..1} matches {...}
                        ELEMENTat0007 occurrences matches {0..1} matches {...}
                        CLUSTER[at0008] occurrences matches {0..1} matches {...}

 	                 group cardinality matches {1} matches { -- must include at least one of these
        	            ELEMENT[at0009] occurrences matches {0..1} matches {...}
                            CLUSTER[at00010] occurrences matches {0..1} matches {...}
			}
                    }

                    ELEMENT[at0011] occurrences matches {0..1} matches {...}    -- (some other details)
                    CLUSTER[at0012] occurrences matches {0..1} matches {...}    -- (some other details)
                }
            }
        }
    }

The above says that from the outer group, 2 items must be chosen; the inner group statement forces one of these 2 to be either one of the nodes [at0009] or [at0010]. It appears that such constraints could in fact be quite sensible, and probably should be supported.

With respect to the AOM, the above proposal would require a group construct, which has not yet been designed. This solution would be more complex in terms of ADL / AOM, but would produce the correct outcome in the data, that is to say, no extra 'junk' objects. It would also require some work in editing tools to visualise the construct as a 'choice' construct in a way comprehensible to clinical model authors.

Archetype Slots, Fillers, and ordering

Another problem with (un)ordered lists of multiple items within a container attribute relates to archetype slots. An archetype slot is an object node that defines possible archetypes that may be used, rather than defining further constraints inline. The following example shows a typical use of slots.

     SECTION[at0000] matches {    -- SOAP
        items cardinality matches {0..*; unordered} matches {
            SECTION[at0001] occurrences matches {0..1} matches {    -- S
                items cardinality matches {0..*; unordered} matches {
                    allow_archetype OBSERVATION[at0006] occurrences matches {0..1} matches {    -- Subjective observations
                        include
                            archetype_id/value matches {/.*/}
                    }
                    allow_archetype SECTION[at0007] occurrences matches {0..1} matches {    -- Subjective sections
                        include
                            archetype_id/value matches {/.*/}
                    }
                }
            }
            ....
        }

The problem comes when we want to define multiple slots to constrain multiple items that may occur in runtime data - there is not always a 1:1 correspondence. This then calls into question exactly what the 'ordered' constraint on a container attribute means. Let us first consider some data that we want, of the general structure: 'any number of problem and diagnosis Entries, followed by one or more plan & treatment Entries'. An example of data following this structure would be:

  • EVALUATION: problem #1
  • EVALUATION: diagnosis #1
  • EVALUATION: problem #2
  • EVALUATION: problem #3
  • EVALUATION: plan
  • INSTRUCTION: medication #1
  • INSTRUCTION: therapy #1

The slot constraints needed to define this are as follows:

   SECTION[at0001] occurrences matches {0..1} matches {    -- S
        items cardinality matches {0..*; ordered} matches {
            allow_archetype EVALUATION[at0006] occurrences matches \{*\} matches {    -- Problem
                include
                    archetype_id/value matches {/openEHR-EHR-EVALUATION\.problem\.v*/}
            }
            allow_archetype EVALUATION[at0007] occurrences matches \{*\} matches {    -- Diagnosis
                include
                    archetype_id/value matches {/openEHR-EHR-EVALUATION\.problem-diagnosis\.v*/}
            }
            allow_archetype EVALUATION[at0008] occurrences matches {1} matches {    -- Plan
                include
                    archetype_id/value matches {/openEHR-EHR-EVALUATION\.plan\.v*/}
            }
            allow_archetype INSTRUCTION[at0009] occurrences matches \{*\} matches {    -- INtervention
                include
                    archetype_id/value matches {/openEHR-EHR-INSTRUCTION\.plan\.v*/}
            }
        }

The above says that the SECTION.items attribute is an ordered list, and that its contents include multiple EVALUATION objects representing problem, diagnosis and plan, and also multiple INSTRUCTION objects representing interventions. The problem is now apparent. Each slot definition is set of possibilities, but we do not necessarily want to follow the slot ordering for the ordering of the archetypes chosen to fill the slots. This means we need to understand what the 'ordered' constraint actually means in this context: does it mean that the slots define order? Regardless, they do not give us a way to impose any internal structure on the list. We could use the group construct above to help. We could for example define the following:

   SECTION[at0001] occurrences matches {0..1} matches {    -- Subjective
        items cardinality matches {0..*; ordered} matches {

            group cardinality matches {0..1} occurrences matches {0..*} matches {                                   -- sub-group of any number of problems & diagnoses
                allow_archetype EVALUATION[at0006] matches {    -- Problem
                    include
                        archetype_id/value matches {/openEHR-EHR-EVALUATION\.problem\.v*/}
                }
                allow_archetype EVALUATION[at0007] matches {    -- Diagnosis
                    include
                        archetype_id/value matches {/openEHR-EHR-EVALUATION\.problem-diagnosis\.v*/}
                }
            }

            allow_archetype EVALUATION[at0008] occurrences matches {1} matches {    -- Plan
                include
                    archetype_id/value matches {/openEHR-EHR-EVALUATION\.plan\.v*/}
            }
            allow_archetype INSTRUCTION[at0009] occurrences matches \{*\} matches {    -- INtervention
                include
                    archetype_id/value matches {/openEHR-EHR-INSTRUCTION\.plan\.v*/}
            }
        }

Now the above has the desired result in data: a group of any number of problems & diagnoses, followed by a plan, followed by one or more Interventions.

Semantics of 'group'

Based on the above, the current preference is to define the syntax of 'group' as a pure inline constraint addition to ADL, i.e. not as a pseudo-object block. Nesting would be allowed. Group statements should be viewed as the equivalent of special invariant statements that control choice of sub-groups of nodes within larger lists. A group block would have the following syntax:

group_block ::= 'group' 'matches' cardinality_constraint occurrences_constraint '{' ( object_block )+ '}'
cardinality_constraint ::= '{' n [ ';'  uniqueness_constraint ] [ ';'  ordering_constraint ]  '}'                                  | '{' n '..' m [ ';'  uniqueness_constraint ] '}'
occurrences_constraint ::= '{' n '}' | '{' n '..' m '}'
uniqueness_constraint ::= 'unique'
ordering_constraint ::= 'ordered' | 'unordered'

The uniqueness constraint indicates that multiple items in the data can only distinct archetype nodes, not the same one. The ordering constraint can be used to turn off or on ordering for the group of data nodes matching the archetype nodes within the group. The default ordering setting is provided by the enclosing cardinality constraint, or a parent group constraint, if one exists. Thus, ordering that is required by the cardinality statement can be turned off by a group statement, allowing the data items matching the archetype nodes within to be unordered, while retaining their position in the overall order of the items within the containing attribute.

These semantics allow for a range of variations, for example:

  • group cardinality matches {1} -- must match 1
  • group cardinality matches {0..1} -- optionally match one node from the following group
  • group cardinality matches {0..3} occurrences matches {*}-- optionally match up to 3 nodes from the following group, allowing repeats
  • group cardinality matches {2..3} occurrences matches {*} -- match between 2 and 3 nodes from the following group, allowing repeats
  • group cardinality matches {3; unique} -- match exactly 3 unique nodes from the following group (i.e. 2 objects in data cannot match one node in the group)
  • group cardinality matches {} occurrences matches {} -- match any number of nodes from the following group, including repeats

Nesting

Where nested group statements occur, the main requirement is that the multiplicity intervals are compatible.

TBC

Specialisation

Overrides made in a specialised archetype to a part of a parent archetype containing a specialisation must obey the general rules of archetype specialisation: the specialised archetype cannot violate the constraints in the parent. This leads to the following rules:

  • a group statement cannot be removed in a child archetype (note that this prevents child archetypes imposing alternative sub-groups);
  • the multiplicity interval of a group statement can only be redefined to an interval the same as or contained within the multiplicity interval of the same statement in the parent;
  • a group 'unique' constraint in the parent cannot be removed in the child, but the child can add a 'unique' constraint where none existed.

Semantics of Ordering

Ordering is considered to apply to a data objects that conform to object constraint nodes defined in a container in an archetype.