GDL3 - reworked CHOPS-21 Example

Status: this page is in development.

Overview

NOTE: In this page, we use the term ‘decision logic module’ (DLM) for convenience, and it should not be taken to be a permanent name. Other names such as ‘computerised guideline module’ (CGM), ‘clinical practice guideline’ (CPG) etc are also common.

The problem

The aim of this page is to describe the form of a ‘decision logic module’ (DLM) that can be used in at least the following scenarios:

  • Task Plans, to provide the subject data items and conditions required for decision branches;

  • GDL3 guidelines, which in general will have more sophisticated rules, rule-sets, decision tables and so on;

  • Form calculators, to provide a way to express the computations of derived fields in an application form, e.g. score-based forms.

Background

Recently we created a representation of an NHS CHOPS (chemotherapy) Guideline in the form of:

  • an openEHR Task Planning plan;

  • a ‘subject data set' (subject variables needed to execute rules);

  • a ‘decision support’ module containing rules.

The example may be found here. On this page, based on feedback from @Rong Chen and @Ian McNicoll particularly taking account of GDL experiences to date, is a modified version of the second two parts. One thing Ian and Rong didn't like is my use of P.bilirubins, P.heart_rate etc to distinguish to subject variables from rule symbols in a rules module. I am not yet convinced it is a good idea to put all symbols in the same namespace, but for now, I've gone with that approach.

The version below is a progression in the evolution, not a final specification. Feedback is welcome.

Decision logic module

Task Plans and GDL3 guidelines both require a way of expressing rules and declaring input variables. Guidelines additionally need a way of declaring output variables. These needs are achieved with the flexible use of a single kind of multi-section module called a decision logic module which has the following structure:

dlm <identifier> [language <lang and translations meta-data, as for archetypes>] [description <descriptive meta-data>] [use_model <reference model ids; defaults to openEHR Foundation & Base types>] -- declare all supplier DLMs [use <local identifier>: <dlm identifier> <local identifier>: <dlm identifier>] [preconditions <conditions that act as pre-conditions for this module>] [reference <constant definitions>] -- declare all variables needed by this DLM, including currency input <subject variable declarations> conditions <simple boolean conditions (functions)> rules <complex rules (functions), returning any data type> [output <output variable declarations] [terminology <symbol definitions in archetype format>] [bindings <local bindings of symbols to data set paths>]

Identifiers

All identifiers in a DLM are strings with no whitespace, and if the terminology section is present, these identifiers are included with linguistic definitions and translations, in the same way as archetypes.

Declarations in the use, input, conditions, rules, and output sections create identifiers that may be used in other sections.

Identification

DLMs are identified internally using a string identifier of the form:

  • openEHR-DLM.<concept>.vN.N.N

The final part represents a 3-part version identifier consisting of major.minor.patch, which follows the semver.org rules.

A reference to a DLM may be effected by quoting the whole identifier, or a form with the minor and / or patch parts of the version missing. Either of these forms identify the most recent matching DLm available. For example, the reference openEHR-DLM.NHS-chops14.v4 identifies whatever the most recent minor version of the v4 major version of the NHS-chops DLM is locally available.

Formal Relationships

DLMs as currently conceived have the following formal relationships:

  • A DLM cannot ‘inherit’ from another DLM

  • a DLM can use other DLMs, by declaring each DLM with an identifier in the use section.

Uses

A DLM can be used in different ways to represent:

  • Plan context: just a data-set and Boolean conditions, as required for Task Plans;

  • A re-usable Guideline: that may be referenced by other DLMs, consisting of data set and rules, e.g. BMI, BSA calculators;

  • A top-level Guideline: such as for CHOPS-14, consisting of input data set, rules; generally developed from a published guideline.

  • An in-form calculator that calculates values for fields derived from other directly input fields.

For these different usages, various sections are mandatory, and optional in others. For example, for deployment as a GDL3 guideline or an in-form calculator, the bindings section would be mandatory, whereas for Task Planning, it would not be used.

A ‘raw’ form might be useful, which doesn’t include the sections language, description, or terminology. This would allow DLMs to be developed with untranslated variables or other meta-data.

Deployment and Data Binding

The intention here is to define a single formalism that can be used flexibly for TP, GDL and other uses (e.g. forms with computational elements). The key semantic differences between the TP and GDL conceptual approach appear to be:

  • GDL guidelines are executed in a single-shot fashion; they may be re-executed multiple times (e.g. due to new events being notified), but each time is a new execution from scratch; there is no temporal aspect to the execution;

  • GDL subject data is defined by a pre-defined custom structured data-set, that contains all the data needed in the guideline, and the ‘variables’ are just mappings to paths within that;

  • for the TP scenario, there is in general no predefined data-set that can be created for one or more TP Work Plans; instead, each work plan references a set of variables it needs, and these are populated by the proxy service as the plan runs;

  • for a form calculator, the form data acts as the data set.

These differences are shown in the deployment architectures for Task Planning and GDL below.

Task Planning

The following shows an operational (i.e. service) architecture for Task Planning, using DLMs to provide the data set and conditions for the Plan decision nodes.

Here, the Subject proxy service provides values for the variables (e.g. systolic_blood_pressure, weight etc) declared in the GDL3 dataset artefacts. It abstracts away the various source systems and data models / standards, and also tries to obtain needed data at the UI if it cannot be found in e.g. the EMR.

The TP engine receives notifications from the EHR indicating e.g. commit of new lab result, new patient creation etc.

Variables declared in a DLM input section of an executing guideline are registered in the Subject Proxy service at guideline load time.

A Plan context DLM only needs some of the DLM sections, as shown in this example:

dlm simple_cardiology language original_language = <[ISO_639-1::en]> description lifecycle_state = <"stable"> original_author = < ["name"] = <"Rong Chen"> ["organisation"] = <"Acme healthcare"> ["date"] = <"2020-03-22"> > details = < ["en"] = < language = <[ISO_639-1::en]> purpose = <"Simple cariology example"> > > input systolic_blood_pressure: Quantity currency = 1 min ranges = { high: |> 140 mm[Hg]|, normal: |> 80 mm[Hg] .. <= 140 mm[Hg]|, low: |<= 80 mm[Hg]| } resting_heart_rate: Quantity currency = 30 sec resting_heart_rhythm: CodedTerm currency = 30 sec conditions high_blood_pressure: Result <- systolic_blood_pressure.range = high heart_rate_irregular: Result <- resting_heart_rhythm != regular terminology term_definitions = < ["en"] = < ["resting_heart_rate"] = < text = <"heart rate at rest"> description = <"..."> > ... ["heart_rate_irregular"] = < text = <"heart rate is irregular"> description = <"..."> > > >

This provides 3 variables (systolic_blood_pressure, resting_heart_rate, resting_heart_rhythm) and 2 conditions (high_blood_pressure, has_arrhythmia) that may be used in a Task Plan or other context.

GDL3

The GDL3 architecture is somewhat different, being built around CDS-hooks notifications from the EHR environment.

The GDL3 engine receives notifications from the EHR indicating e.g. commit of new lab result, new patient creation etc.

The subject variables (e.g. weight, is_diabetic, neutrophils) defined in a GDL3 guideline DLM are mapped to paths within a custom dataset, which is the basis of data retrieval.

The custom data set thus needs to be defined for GSL e.g. as an openEHR ‘template’ that is retrieved at ‘load’ and possibly subsequent ‘execute’ points in time. The mappings are achieved with an bindings section at the bottom of the DLM, as shown below.

dlm simple_cardiology language original_language = <[ISO_639-1::en]> description lifecycle_state = <"stable"> original_author = < ["name"] = <"Rong Chen"> ["date"] = <"2020-03-22"> > details = < ["en"] = < language = <[ISO_639-1::en]> purpose = <"Simple cardiology guideline"> > > input systolic_bp: Quantity currency = 1 min ranges = { high: |> 140 mm[Hg]|, normal: |> 80 mm[Hg] .. <= 140 mm[Hg]|, low: |<= 80 mm[Hg]| } heart_rate: Quantity currency = 30 sec heart_rhythm: CodedTerm currency = 30 sec rules ... terminology term_definitions = < ["en"] = < ["heart_rate"] = < text = <"heart rate at rest"> description = <"..."> > ... > > bindings datasets = < ["Cardiology_dataset_4"] = < dataset = <"Cardiology_dataset_4"> bindings = < ["systolic_bp"] = <"/vital_signs/blood_pressure/systolic"> ["diastolic_bp"] = <"/vital_signs/blood_pressure/diastolic"> ["heart_rate"] = <"/vital_signs/heart_rate"> ["heart_rhythm"] = <"/vital_signs/heart_rhythm"> > > >

Questions:

  • how does ‘currency’ work in the custom dataset scheme?

Form calculator

This scenario uses a DLM to define the rules / functions that calculate derived fields in a form from primary input fields. As such, it is deployed within the client application space, not the server side. It would presumably use a bindings section to map a set of local variables to form field paths or ids.

TBD

DLM Semantics

Apart from the data-set definition and binding, the rest of the semantic requirements appear to be the same for TP and GDL, i.e. to do with rule representation, handling null values, and so on. Some of these are dealt with below.

External access

The above DLM structure allows a DLM to reference other DLM instances in the use section, each of which is locally identified by a convenient identifier. This allows the symbols of the referenced guideline to be accessed via the notation local_id.symbol, e.g. BSA.bsa_m2 to mean the symbol bsa_m2 in a DLM referenced via the declaration BSA: Body_surface_area.

Null value handling

One of the differences between the variables declared in the dataset and in the guideline is that the former values may potentially be null, assuming that no value was available from the Subject Proxy service (bearing in mind that it is responsible for trying to obtain missing data from live users in the circumstance where no data can be found in the EMR or other connected systems). Null data items could be handled two ways in guideline rules:

  • guideline rules are written as if all variables are guaranteed to be non-null at time of execution;

  • guideline rules are written to handle null values.

Following the second option means that guideline rules have to continually check for values being defined, e.g. using a predicate check such as:

This is likely to be cumbersome, and obscure the clarity of rules from an authoring point of view. Experience with guidelines over the years indicates that the vast majority of guidelines are executable only when values for their rule input variables are available. Nevertheless, some subject variables will be unavailable when a guideline is executed, e.g. test results that have a complex procedure (e.g. biopsy) or take time (e.g. micro culture), and for which missing data are handled in the guideline.

A practical approach is to declare such variables as potentially void within the data set declaration. This could be done in a similar way to modern programming languages, using a '?' to indicate a nullable variable, e.g.:

In the above, the ejection_fraction variable is declared as nullable, and within a guideline or in a plan, a null check like if defined (ejection_fraction) could be used. Behind the scenes, the Quantity? would translate to something like Data_item<Quantity>, where Data_item is a wrapper type with the following definition:

This achieves the same effect as the openEHR Element type, just in a simpler type system. This approach enables typical null-value rules to be defined, such that an expression such as ejection_fraction > 20% will not cause an exception even if the ejection_fraction raw value is null. Of course it is usually better to test for defined (ejection_fraction) first, if ejection_fraction was defined as Quantity?.

QUESTION: We might have to consider whether using the ‘?' declaration obscures the difference between a 'non-available’ variable (i.e. no value available from the outside world) and one that is simply null in the standard programming language sense. If we assume that the latter can happen within datasets and guidelines, we might want to allow both Type? and e.g. the use of the explicit type Data_item<T> to distinguish the two cases.

Effective time

Sometimes the clinically significant time of a data item needs to be accessed in rules. This can be done by the reference x.effective_time, e.g. ejection_fraction.effective_time returns when the ejection fraction measurement being used was taken. For a variable like is_diabetic, it means when the diagnosis was made.

Dataset meta-data / hints

In the dataset declarations below, variables are declared under a input keyword, which may have structured comments attached to it, e.g.:

In the above, there are two meta-data ‘hints’ to potentially be used by the Subject Proxy service to know how to formulate a data request or query. These include:

  • time_window: a period denoted by either:

    • a symbolic name: “all” | “current episode” | “prior history” each of which is convertible to an interval of the form start = -D1, end = -D2, where D1 and D2 are Durations backward from the current moment.

    • an explicit window formed by |D1..D2|, where D1 and D2 have the same definitions as above.

  • source: a name referring to a kind of data or system.

Dataset variable ‘currency’

Currency, i.e. how current a variable value is can be set on a dataset variable declaration, as can be seen in the example below. This is used by the Subject Proxy service to determine how often to renew the value (e.g. by reading an EMR, devices etc).

For GDL3, currency is performed as a check on recency of the effective clinical time of the dataset elements.

  • not clear how this would work for vital signs, ICU, any real-time situation.

Dataset quantitative variable ‘ranges’

A set of ranges can be defined for a quantitative variable in a dataset, e.g.:

These ranges are understood to be those required by the guideline(s) / plan(s) using the data set, and may not be the standard laboratory ranges for a normal healthy person. The idea here is that given such a declaration, expressions like systolic_blood_pressure.range = high are now possible within rules. Here, .range is understood as an argumentless function that converts the variable value to whichever range it falls in, a very common need in guidelines. It also allows decision branches in plans to be defined on the basis of such ranges.

Comments and documentation

I have changed the multi-line comment character to the vertical bar, as an experiment. Makes things nice to read. We could potentially consider that any text to the right of one of these bars is assumed to be in Asciidoc markdown or similar.

We might potentially consider annotations like in Java e.g. @xxx to mark various parts of documentation text for smart extractors.

Implementation

Implementation of the rules aspect of DLMs is likely to be relatively straightforward, since it is based on evaluation of statements and expressions that constitute value-returning rules.

The more challenging implementation question is to do with the subject dataset representation, and the Subject Proxy service.

The primary question here is how a variable access from either a Task Plan or a GDL3 guideline is effected. The use of a variable such as is_diabetic or platelets within an expression has to ultimately result in a value retrieval, either:

  • (TP) from an intermediate cache which is populated by the Subject proxy component / service;

  • (GDL3) from the latest copy of the custom data set.

If we think of how it can work very concretely in the TP environment, using a Subject Proxy component, a symbolic reference like is_diabetic from within an expression being evaluated by an interpreter has to result in either a passive data access (like in a typical computer program execution) or a function call. In the former case, the symbol is effectively a data variable or property defined in some ‘class’ or module, and the connection between it and the data retrieval is undefined.

A more useful way to understand a variable reference could be as a function call from a local ‘proxy’ object. We could consider that the declaration above of systolic_blood_pressure: Quantity above, along with the currency and ranges really generates an instance of an object of a type such as ProxyVarQuantity, which looks like this:

In an expression such as platelets < 60 x 10^9/L , the platelets reference is understood as a call to Proxy_var_quantity.value, which retrieves the actual value (it would require an internal reference to the Subject Proxy to do this). In an expression like platelets.in_range (low) , the platelets.in_range() is a call to Proxy_var_quantity.in_range().

According to this scheme, the execution of the declarations of is_diabetic or platelets causes creation of Proxy_var_xxx objects that have access to the Subject Proxy data cache by making calls like get_value(“is_diabetic”) or perhaps typed calls like get_quantity_value(“platelets”). How the Subject Proxy gets its data depends on the bindings of named variables like is_diabetic to back-end binding meta-data such as AQL queries (for openEHR access), HL7 FHIR calls/resources for FHIR, and whatever other methods are available for other kinds of data source.

When a Proxy_var object is created, it would need to register the named variable, type, currency and possible the ‘hint’ meta-data described earlier, in the Subject Proxy component, for the current patient.

Reworked Example

The following is the example referenced at the top of the page reworked as a small number of DLMs, as described on this page.

Core patient state dataset

Reusable Guidelines

The CHOPS guideline

The following is the GDL3 guideline (i.e. ruleset) for CHOPS-21.

Questions

DLM - DLM relationships

It could make sense to allow DLMs to both ‘inherit’ (extend keyword) and ‘use’, i.e. be composed. Inheriting a re-usable DLM like BSA (body surface area) would allow height, weight, bsa_m2 to be referenced directly in the CHOPS guideline. Whereas if the CHOPS guideline wanted to reference rules in some other DLM based on a published (say) Lymphona diagnostic guideline, it would still use an use declaration.