Semantic slot proposal

Status

This proposal appears unlikely to go into ADL/AOM 1.5, and will probably evolve based on specifications emerging from IHTSDO. Some form of it is likely to be added to ADL 1.6 or later.

Introduction

In ADL 1.4 and lower, the idea of a 'slot' is a joining point from one archetype to another, similar to the idea of an 'association' or 'aggregation' relationship in UML. It is usually defined by a constraint on archetype identifers, although other kinds of constraint can be used. In the openEHR 'template' concept, slots are normally 'filled', in order to create large aggregate structures, such as whole COMPOSITIONs.

Over the last few years, experience with archetypes and templates has shown that there are subtleties to do with slot definition and slot-filling that were not originally thought of. At the same time, the description of the semantics of specialisation in archetypes has made it clearer that an improved semantics of slots could probably be wholly accommodated within the archetype concept.

Known problems:

  • ordering of slot definitions within an attribute does not adequately address the need to order items in the data.
  • defining slots with regular expressions on archetype identifiers is limited, especially if we change the way specialised identifiers work (see here), because of the need to be able to constrain using ontological relationships rather than lexical patterns. Example: not all lab archetypes come under the 'lab_observation' archetype parent.
  • currently it is not possible to include a reference to another archetype that has been factored out, the only way to do this is to create a slot whose inclusion pattern matches only one archetype
  • the ability to simultaneously match on different parts of an archetype identifier in different ways is limited.
  • the semantics of 'redefinition' of slots over specialisation is not defined
  • the semantics of the 'include' and 'exclude' parts of the slot statement have not been properly defined.

This page discusses the requirements as we understand them today, and an improved semantics for slots that should satisfy these needs.

Modelling Background

In ADL, constraints are stated for an attribute in terms of:

  • its existence, defining whether it is 'null' or not, i.e. whether there is an object at all, in the data
  • for container attributes, its cardinality, which defines the allowed possible number and ordering (or not) of members in the container;
  • its possible values, defined in terms of object constraints.

A 'slot' is one kind of object constraint, and has the meaning 'archetypes that match this constraint may go here'. A typical use of slots is shown below:

    SECTION[at0000] matches {    -- SOAP       
        items cardinality matches {0..*; unordered} matches {
            SECTION[at0001] occurrences matches {0..1} matches {    -- S
                items cardinality matches {0..*; unordered} matches {
                   allow_archetype OBSERVATION[at0006] occurrences matches {0..1} matches {    -- Subjective observations
                        include
                            archetype_id/value matches {/.*/}
                    }
                    allow_archetype SECTION[at0007] occurrences matches {0..1} matches {    -- Subjective sections
                        include
                            archetype_id/value matches {/.*/}
                    }
                }
            }
        ....
     }

Each block introduced by 'allow_archetype' is a slot, and stands for 'possible' archetypes that maybe used at that point. Note that:

  • a container attribute may have more than one slot definition, each standing for a different set of possible included archetypes;
  • a single-valued attribute could also have multiple slot definitions, each standing for alternatives;
  • slot definitions can be mixed with non-slot object constraints;
  • the concept of 'filling a slot' is somewhat of a misnomer. In fact it is not the slot being 'filled', it is the containing attribute, which might contain more than one slot definition.
  • A slot filler object does not necessarily correspond to a single object - in general it acts like any other constraint due to occurrences being e.g. 0..*.

Requirements

Slot-related requirements known for archetypes include the following.

  • A1: Basic slot definition: use of a single slot to define the archetypes allowed to be used under a certain attribute.
  • A2: Direct archetype reference: there has been a great increase in re-usable component archetypes, typically CLUSTER type archetypes. When a sub-tree of constraints gets refactored into a new, separate archetype, the archetypes that used to include this content now need to reference the new archetype. There is currently no provision in ADL for doing this yet.
  • A3 Slot-narrowing in a specialised child: it should be possible to redefine a slot as corresponding to a narrower set of archetypes in specialisations of the archetype containing the slot definition.
  • A4: recommended slot-fillers: it should be possible to define a slot whose status is a recommendation, but which doesn't limit the the archetypes that could be used in the enclosing attribute. This is currently done with includes set to X, excludes empty.
  • A5: constrained slot-fillers: it should be possible to define a slot whose status is a hard constraint. This is currently done with includes set to X, excludes set to 'any archetype' (meaning 'any archetype not in the includes list').

Slot-related requirements known for templates include the following.

  • T1: basic filling: templates specify one or more 'fillers' for a slot (but see notes above).
  • T2: complex ordering and occurrences: see bottom of this page herefor an example.
  • T3: open/closed: ability to fully specify or leave open an attribute. This means that even if filler archetypes are specified, the author might want to allow those archetypes not to be used at runtime, instead allowing different archetypes that still fit one or more of the available slots in the attribute to be chosen at runtime. In this case, the filler archetypes can be understood as 'recommendations'. On the contrary, if the attribute is defined such that no further fillers could be used, the combined object constraints can be understood as 'required' archetypes.
  • T4: ordering: ordering of slot 'fillers' should be independently constrainable from the definition of the slots.
  • T5: a template can include another template in a slot rather than an archetype.

Solutions

There are various new elements that help provide a complete solution:

  • Use of structured archetype slot specifications rather than archetype identifier regular expressions.
  • Archetype references: the ability to directly refer to another archetype from with an archetype, without using a slot definition;
  • Use of specialisation semantics: a slot definition can be specialised into both or either of a narrower slot definition, or one or more archetype references that fit the slot. This allows archetype references can be used as slot-fillers;
  • Use of select construct: Use of 'select' construct to support more complex ordering and occurrences requirements (see this page here)

Structured Slot Definition

One of the key things we need to fix is how slots are actually defined. Currently, slots are defined as regular expressions on lexical identifiers. There at least three difficulties with the use of regular expressions in this way:

  • normal human beings, e.g. doctors, don't understand them. This means they have to be more or less hidden and mapped to in archetype / template authoring tools
  • they are hard to compute, for example it is quite a difficult computational task to determine if regex_2 corresponds to a clean subset of the strings matched by regex_1. This makes computing slot specialisations in archetypes difficult.
  • most importantly, they ignore the semantic aspect of what we are trying to match, namely archetypes in an ontological space.

In fact, what we need is a way of separately matching the various parts of an archetype id, and especially for the concept part, some ontological operators. One reason this is needed is that in the future, if the recent governance and identification proposalsare taken up, the concept part of an archetype id will be weakened so that it no longer has to reflect the specialisation relationships of an archetype.  The following example shows how such a constraint could look in ADL:

allow_archetype CLUSTER[at0004.1] occurrences matches {0..1} matches {
    include matches {...} -- C_BOOLEAN
    archetype_id matches {
        ARCHETYPE_ID matches {
            namespace matches {...} -- a C_STRING
            qualified_rm_entity matches {...} -- a C_STRING
            domain_concept matches {...} -- a C_CONCEPT
            version_id matches {...} -- a C_STRING
        }
    }
}

Here we have defined a standard ADL-style constraint on the ARCHEYTPE_ID which is the type of the archetype_id attribute of an ARCHETYPE. Within this constraint, we have separated out the various attributes in the usual way. The first one we have included is the namespace attribute, part of the new identification proposal mentioned above. The remaining three attributes are from the classic definition of an archetype identifier. While string matching is sufficient for both the qualified_rm_entity part (i.e. the part that looks like 'openEHR-EHR-CLUSTER'), it is not for concept matching. Therefore we have introduced a new constraint type, C_CONCEPT, which allows constraints to be expressed on concept identifiers within a concept space.

For a given reference model (e.g. the openEHR RM) and class (e.g. CLUSTER), archetypes are defined within an ontological space. Specialisation hierarchies occur in a similar way to object-oriented software, also equivalent to the way terms are defined within terminologies. Rather than matching on name, we need to match on concept in various ways:

  • exactly this concept
  • this concept or any specialisation
  • any specialisation of this concept, but not the concept itself

These kinds of operators are typical within the ontology / terminology world, but do not currently exist in openEHR ADL. One idea here would be to use the SNOMED constraint operators as syntax, as follows:

Symbol

Meaning

=

This concept only (note, SNOMED uses no operator here, but it may be easier if we do in ADL, since it will alert the parser to a 'concept' match expression rather than some other kind of expression).

<<

This concept or any subtype permitted

<

Any subtype of this concept (but not the concept itself)

Note that in another proposal, the concept part of an archetype id will be weakened so that it no longer has to reflect the specialisation relationships of an archetype.  The following example shows how such a constraint could look in ADL:

allow_archetype CLUSTER[at0004.1] occurrences matches {0..1} matches {
    include matches {True}
    archetype_id matches {
        ARCHETYPE_ID matches {
            ...
            domain_concept matches {<< investigation_methodology}
            ...
        }
    }
}

In the above, 'investigation_methodology' is a concept identifier within the archetype definitional space, and the << operator indicates that this concept or any subtype (i.e. specialised form) is to be allowed. Using this approach, it does not matter what the concept names of archetypes are, only how the specialisation relationships are constructed.

Semantics of 'include' and 'exclude'

In ADL 1.4, the semantics of the include and exclude parts of an archetype slot are not well-defined. A given slot can (and often does) use both 'include' and 'exclude'. Where a slot uses only 'include', it does not actually constrain the slot, but behaves more like a suggestion or recommendation to use archetypes that match the 'include' patterns. If the intention is actually to constrain the slot to use only those matches, then currently the slot must also specify an 'exclude' pattern (typically /.*/).

It is proposed that this be changed as follows:

  • only one of these qualifiers can be used in a given slot:
    • if 'include' is used, the meaning is that only the archetypes matching the constraints will be allowed
    • if 'exclude' is used, the meaning is that any archetype apart from those matching the constraints will be allowed
  • this choice cannot be changed over specialisation, only the constraints can be specialised

The above structured slot definition implements this by treating include/exclude as a Boolean flag on a slot.

Changes in the AOM and ADL

A new kind of Slot type would be required in the AOM to support the above kind of structured definition. In the interests of maintaining compatibility with existing archetypes, it is proposed to add a new class ARCHETYPE_SLOT_2, which would co-exist with the current ARCHETYPE_SLOT class. No changes would be required in ADL, since a parser can easily distinguish the two types of slot definition and generate an instance of the correct AOM class.

Archetype Reference

A new ADL/AOM construct is proposed, called an Archetype Reference. This is similar in concept to an 'internal reference', but rather than pointing to another part of the same archetype, it points to a different archetype. The archetype reference would allow requirement A2 to be met.

Archetype references could be used in two ways. First, they can be used directly in an attribute, just as any other object constraint; the only difference being that they refer to another archetype containing the required definition, rather than providing the definition inline. Example:

items cardinality matches {0..*; unordered} matches {
    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason
    use_archetype CLUSTER[at0004] occurrences matches {0..1} openEHR-EHR-CLUSTER.investigation_methodology.v1 -- Investigation methodology
}

In this example, the archetype openEHR-EHR-CLUSTER.investigation_methodology.v1 is being referred to in the at0004 node, which is a top-level node. This is an example of an archetype reference completely distinct from any archetype slot.

The semantics of an archetype reference are shown in the following UML diagram, which shows it as a type in the Archetype Object Model.


We can see it has both occurrences and the node_id attributes of other C_OBJECT types. In addition it defines an archetype identifier. We can understand the meaning of the occurrences as for any other C_OBJECT, i.e. it defines the number of objects in data that match the object in the referenced archetype. This could be 0..1, 1..1, 0..*, or something specific like 2..5. The node_id tells us what the meaning of the node in the originating archetype is, even if the included archetype is a far more general one.

Archetype References for filling Slots

In the interests of including all 'template' semantics in the same formalism, it is useful to see how to relate an archetype reference to a slot definition. Clearly an archetype reference logically could be a 'slot-filler'.

Doing it by specialisation

One approach would be to consider an archetype reference(s) as a specialisation of a slot definition. Remembering that the definition of the slot must remain intact, since it may be used at runtime to determine what further archetypes can go into a slot. This would lead to the rules:

  • a slot can be specialised into either or both of the following:
    • a narrower slot
    • one or more archetype references that conform to the slot

We can visualise this in ADL with the following archetypes. Firstly the parent:

items cardinality matches {0..*; unordered} matches {
    ELEMENT[at0002] occurrences matches {1} matches {...}    -- Investigation type
    ELEMENT[at0003] occurrences matches {0..1} matches {...}    -- Investigation reason
    allow_archetype CLUSTER[at0004] occurrences matches {0..1} matches {
        ARCHETYPE_ID matches {
            namespace matches {"org.openehr.clinical"}
            qualified_rm_entity matches {"openEHR-EHR-CLUSTER"}
            domain_concept matches {<< methodology}
            version_id matches {".v1"}
        }
    }
}

Now the specialisation:

..../items matches {
    use_archetype CLUSTER[at0004.1] occurrences matches {0..1} openEHR-EHR-CLUSTER.investigation_methodology.v1 -- investigation methodology
}

Here we have redefined the items attribute such that an archetype reference constraint provides a redefinition of the archetype slot one. Note that the node_id at0004.1 is a specialisation of the at0004 code of the slot being redefined. The effect we want here is to leave the archetype slot 'in force', i.e. if more fillers were to be added, they would still correspond to the slot definition.

Alternatively, we might want to redefine the slot definition itself, in order to narrow it. This can be done as follows.

We can visualise this in ADL with the following archetypes. Firstly the parent:

items cardinality matches {0..*; unordered} matches {
    allow_archetype CLUSTER[at0004] occurrences matches {0..1} matches {
        ARCHETYPE_ID matches {
            namespace matches {"org.openehr.clinical"}
            qualified_rm_entity matches {"openEHR-EHR-CLUSTER"}            domain_concept matches {<< methodology}
            version_id matches {".v1"}
        }
    }
}

Now the specialisation:

..../items matches {
items cardinality matches {0..*; unordered} matches {
    allow_archetype CLUSTER[at0004.1] occurrences matches {0..1} matches {
        ARCHETYPE_ID matches {
            domain_concept matches {<< investigation_methodology}
        }
    }
}

We could also do both. Assume the parent as above....now the specialisation:

..../items matches {
    allow_archetype CLUSTER[at0004.1] occurrences matches {0..1} matches {
        ARCHETYPE_ID matches {
            domain_concept matches {<< investigation_methodology}
        }
    }
    use_archetype CLUSTER[at0004.2] occurrences matches {0..1} openEHR-EHR-CLUSTER.microbiology_methodology.v1 -- microbiol investigation methodology
}

Here we have two specialised nodes, one a narrowed slot, the other an archetype reference; with node_ids of at0004.1 and at0004.2 respectively. The meaning of this is that a 'microbiology investigation methodology' CLUSTER object (which conforms to the original parent archetype slot) could be used; as well, any archetype corresponding to the narrowed slot definition, i.e. any 'investigation_methodology' archetype.

A better specialisation approach?

The above idea treats an archetype reference as a direct kind of specialisation of an archetype slot definition. This would be practically convenient, but is semantically unsatisfying, and is likely to get in the way with more complex examples.

An improved approach might be to explicitly include the idea of a 'binding' within the definition of the new-style structured slot. This would entail a definition of a slot in ADL as follows:

allow_archetype CLUSTER[at0004.1] occurrences matches {0..1} matches {
    include matches {...} -- C_BOOLEAN
    archetype_id matches {  -- as above   }
    bindings matches {
    }
}

xxxx

Ordering of Slots and Fillers

One problem that has come up in archteypes and templates to date is the need to specify the ordering of slot fillers. There appear to be two ways to deal with this. The first is to treat the position and related constraints of slots themselves (remembering that there can be more than one within the same attribute) as significant. See the discussion on the lower part of this page for this solution. With the help of the 'select' construct, this approach looks reasonable.

Q: Is there a need for complex ordering, occurrences etc of fillers within a given slot?

A Complex Example

An example of a template expressed as a set of specialised archetypes, based on the above principles is here (HTML files zipped; download and unzip in a local directory; then double click on the COMPOSITION template file). This shows the following features commonly found in templates, but expressed using the ADL 1.5 syntax, including above suggestions:

  • override occurrences to {0} (i.e. remove) or {1}  (i.e. mandatory)
  • rename a node, i.e. constrain the value of the LOCATABLE.name attribute inherited on most nodes
  • archetype 'reference' (see above) as slot refinement
  • specialised node meanings - overridden at-codes (see OBSERVATION result archetype in zip file)
  • some nodes are marked with 'passthrough', which is a possible way of including the constraint 'hide on form' currently in use in templates

The examples currently do not attempt to use any inline slot redefinition.