Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Changed a dead link from svn to git

...

...

...

...

Table of Contents
maxLevel3
indent20px
styledisc

Overview

Developer

Ocean Informatics Team: Chunlan Ma MD, Heath Frankel, Thomas Beale

...

Archetype Query Language (AQL) is a declarative query language developed specifically for expressing queries used for searching and retrieving the clinical data found in archetype-based EHRs. It is applied to the openEHR EHR Reference Model (RM) and the openEHR clinical archetypes, but the syntax is independent of applications, programming languages, system environment, and storage models. The minimum requirement for data to be querying with AQL (including with archetype structures and terminology) is for the data to be marked at a fine granularity with the appropriate archetype codes and terminology codes. This may be native openEHR-structured data, or legacy system data to which the relevant data markers (mainly archetype paths and terminology codes) have been added. Unlike other query languages, such as SQL or XQuery, AQL expresses the queries at the archetype level, i.e. semantic level, other than at the data instance level. This is the key in achieving sharing queries across system boundaries or enterprise boundaries.

AQL has the following distinctive features:

...

The openEHR path syntax has been described in a detail (http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/overview.pdfsee the overview specification). AQL utilises the basic openEHR path syntax to represent both coarse-grained and fine-grained archetype nodes and openEHR class instances and attributes. The syntax details are not repeated here. A set of openEHR path examples are provided.

...

Archetype path is the path referring to any nodes within an archetype. The picture below is an ADL workbench screen shot of apgar observation node map. The table below shows some examples.

Archetype RM type

Archetype node name

Path syntax

OBSERVATION

Apgar score

/

OBSERVATION

1 minute

/data[at0002]/events[at0003]

OBSERVATION

Heart rate

/data[at0002]/events[at0003]/data[at0001]/items[at0005]

...

OpenEHR RM attribute name

Path syntax

COMPOSITION.category

/category

COMPOSITION.context.start_time

/context/start_time

COMPOSITION.uid.value

/uid/value

...

A parameter can be used for all criteria values within an AQL statement, e.g.

  • within a predicate

    No Format
    [$archetypeId], [at0003, $nameValue], [ehr_id/value=$ehrId]
    
  • outside a predicate

    No Format
    o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value > $systolicCriteria
    

...

  • left operand is normally an openEHR path, such as ehr_id/value, name/value
  • right operand is normally a criteria value or a parameter, such as '123456', $ehrUid. It can also be an openEHR path (based on the BNF), but we do not have an example of this situation yet.
  • operators include:

    No Format
    >, >=, =, <, <=, !=
    

...

FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS
OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1]

...

Node predicate is also a shortcut of standard predicate. It has the following forms:

  • containing an archetype node id (known as atcode) only, e.g.

    No Format
    [at0002]
    
  • containing an archetype node id and a name value criteria, e.g.

    No Format
    [at0002 and name/value=$nameValue]
    [at0002 and name/value='real name value']
    
  • containing an archetype node id and a shortcut of name value criteria, e.g.

    No Format
    [at0002, $nameValue]
    [at0002, 'real name value']
    
  • The above three forms are the most common node predicates. A more advanced form is to include a general criteria instead of the name/value criteria within the predicate. The general criteria consists of left operand, operator, and right operand, e.g.

    No Format
    [at0002 and value/defining_code/terminology_id/value=$terminologyId]
    

...

Operator

Meaning

Example

=

Equal

No Format
name/value = $nameValue

>

Greater than

No Format
o/data[at0001]/.../data[at0003]/items[at0004]/value/value >140

>=

Greater than or equal to

No Format
o/data[at0001]/..../data[at0003]/items[at0004]/value/value >=140

<

Smaller than

No Format
o/data[at0001]/.../data[at0003]/items[at0004]/value/value <160

<=

Smaller than or equal to

No Format
o/data[at0001]/.../data[at0003]/items[at0004]/value/value <=160

!=

not equals to

No Format
c/archetype_details/template_id/value != ''

...

matches operator is used in WHERE clause. This operator needs left operand and right operand. Left operand is an AQL identified path. Right operand is enclosed within curly braces. Right operand of matches operator has three forms:

  1. type of constraint ADL (cADL): this type of right operand can be a value list or a value range. Value range is not supported in the current AQL grammar. Below is an example with matches a string value list:

    Code Block
    
    SELECT o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/name
    FROM Ehr [uid=$ehrUid] CONTAINS Composition c CONTAINS Observation o[openEHR-EHR-OBSERVATION.microbiology.v1]
    WHERE o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/items[at0019]/items[at0021]/name/defining_code/code_string matches {'18919-1', '18961-3', '19000-9'}
    
  2. URI: can be either terminology URI or EHR URI, or some other URIS. Only terminology URI is supported in the current AQL grammar. An example with matches a terminology URI is shown below:

    Code Block
    
    SELECT e/ehr_status/subject/external_ref/id/value, diagnosis/data/items[at0002.1]/value
    FROM EHR e CONTAINS Composition c[openEHR-EHR-COMPOSITION.problem_list.v1] CONTAINS Evaluation diagnosis[openEHR-EHR-EVALUATION.problem-diagnosis.v1]
    WHERE c/name/value='Current Problems' AND diagnosis/data/items[at0002.1]/value/defining_code matches { terminology://Snomed-CT/hierarchy?rootConceptId=50043002 }
    
  3. embedded AQL query: means an AQL query statement, i.e. embedded query, within the curly braces. Embedded query is not supported in the current AQL grammar. It looks like:

    Code Block
    
    matches {SELECT ... FROM .. WHERE}
    

...

The example below means retrieving the identifiers of the EHRs which do not have discharge admin entry instance.

Code Block

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1]

This example below means from a particular EHR, retrieving all composition uid values and the contained instruction instances if the instruction instances have links with particular EHR_URI value.

Code Block

SELECT c/uid/value, instruction
FROM EHR e [ehr_id/value=$ehrid] CONTAINS COMPOSITION c
     CONTAINS INSTRUCTION instruction[openEHR-EHR-INSTRUCTION.referral.v1]
WHERE EXISTS instruction/links[target='ehr://327000002/87284370-2D4B-4e3d-A3F3-F303D2F4F34B@latest_trunk_version']

...

The example below uses NOT operator followed by a single identified expression

Code Block

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1]

The example below uses NOT operator followed by a boolean identified expression.

Code Block

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT (EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] AND
           e/ehr_status/subject/external_ref/namespace='CEC')

The above example is equivallent to the two expressions shown below:

Code Block

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR
      NOT e/ehr_status/subject/external_ref/namespace='CEC'


SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR
      e/ehr_status/subject/external_ref/namespace != 'CEC'

These advanced operators are not yet supported by the grammar. The operator syntax is borrowed from ADL specifications. These are proposed to improve the richness and flexibility of AQL so that AQL syntax supports more complicated query scenarios.

...

AQL identified path has the following forms:

  • consisting an AQL variable name defined within the FROM clause, followed by an openEHR path, e.g.

    No Format
    o/data[at0001]/.../data[at0003]/items[at0004]/value/value
    
  • consisting an AQL variable name followed by a predicate, e.g.

    No Format
    o[name/value=$nameValue]
    
  • consisting an AQL variable name followed by a predicate and an openEHR path, e.g.

    No Format
    o[name/value=$nameValue]/data[at0001]/.../data[at0003]/items[at0004]/value/value
    

...

No matter the date/date time is complete or partial, it is allowed in AQL as long as it is in ISO8601 date/date time format. Quotes (either single quotes or double quotes) are required to enclose the date/date time value. In AQL grammar, it is treated as a date value if the value is a complete calendar date represented in ISO8601 extended format and if single quotes are used. If the value is in other ISO8601 date/date time format, AQL treats it as a string value, but the AQL query engine processes it differently from a normal string value. It is a temporary solution. In future, a date/date time value should be identified as date/date time value if it is a valid ISO8601 date/date time. Examples are as below:

Code Block

WHERE composition/context/start_time > '19860101' // a complete date. AQL grammar identifies it as a string value.
WHERE composition/context/start_time < "120000+0930"  // ISO8601 time. AQL grammar identifies it as a string value.
WHERE composition/context/start_time <= '1986-01-01'  // complete calendar date in ISO8601 extended format. AQL grammar identifies this value as a date value.

...

Some examples of value list:

Code Block

matches {'string item 1', 'string item 2', 'string item3'}  // a string list. It is equivalent to matches 'string item1' or .. matches 'string item 2' or .. matches 'string item 3'
matches {'20060101', '20070101', '20080101'}            // a date value list
matches {1, 2, 3}                                                      // a integer list
matches {1.1, 2.5, 3.8}                                             // a real value list

...

Value range leaf data is enclosed within curly braces after matches operator. It is only used in AQL WHERE clause to specify a criteria within a range. The syntax is similar as intervals of ordered primitive types detailed in Section 4.5.2, adl.pdf. Some examples:

Code Block

matches {|3.0..5.0|} // means 3.0 <= x <=5.0
matches {|<8|}       // means x<8

...

 A simple FROM clause consists of three parts: keyword - FROM, class expression and/or containment constraints, e.g.

No Format
FROM EHR e [ehr_id/value=$ehrId] CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]

...

Two examples of a class expression are shown below:

No Format
EHR e [ehr_id/value=$ehrId]                                          \\EHR class, class identifier/variable, and a standard predicate
COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]   \\COMPOSITION class, class identifier/variable, and an archetype predicate


Class expressions are used for two purposes:

  1. indicating the constraints on RM classes so as to scope the data source for the query. For instance, EHR e[ehr_id/value='123456'] indicates that the required data must be from a specific EHR with ehr_id value '123456'; while "COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1] " indicates the required data must be from or must be associated with a composition instance with archetype id - openEHR-EHR-COMPOSITION.report.v1.
  2. defining a RM class variable that may be used by other clauses to indicate the required data or data items on which query criteria are applied. The example below uses the class expression to define a variable e which is used by the SELECT clause indicating all relevant ehr_id values are retrieved, and a variable c used by WHERE clause indicating that the query criteria is set on the composition template id value.

    No Format
    SELECT e/ehr_id/value
    FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.referral.v1]
    WHERE c/archetype_details/template_id/value=$templateId
    

...

The syntax of containment constraint is very simple: using keyword CONTAINS between two class expressions. Left class expression is the the parent object of the right class expression, e.g.

No Format
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1]


Boolean operators (AND, OR, NOT) and parentheses are used when multiple containment constrains are required, e.g.

No Format
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] AND
COMPOSITION c1 [openEHR-EHR-COMPOSITION.report.v1]

...

The WHERE clause syntax has the following parts (in order): keyword WHERE and identified expression(s). Boolean operator (AND, OR, NOT) and parenthesis can be used to represent multiple identified expressions. Examples:

No Format
WHERE c/name/value=$nameValue AND c/archetype_details/template_id/value=$templateId

...

  1. Left operand is normally an identified path,
  2. Operator is normally the basic operator,
  3. Right operand is the criteria value, which normally is primitive type, such as string, integer, boolean, double, or float. When it is a string value, single quotation marks or double quotation marks are required. Right operand can also be a parameter or an identified path as well.

Examples:

  • Left operand is an identified path and right operand is a real criteria value - primitive type:

    No Format
    o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
    c/archetype_details/template_id/value='health_encounter'
    
  • Left operand is an identified path and right operand is a parameter:

    No Format
    c/archetype_details/template_id/value=$templateParameter
    
  • Both left operand and right operand are an identified path (this is an advanced feature):

    No Format
    o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >
    o1/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value
    

...

Example 1: retrieve all compositions' name value, context start time and composer name from a specific EHR.

No Format
SELECT c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer
 FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c

Example 2: Retrieve all composition objects of a specific EHR.

No Format
SELECT c
FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c

...

It starts with keyword TOP, followed by an integer number and/or the direction (i.e. BACKWARD, FORWARD), e.g.

No Format
SELECT TOP 10 c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer
FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c

...

The ORDER BY clause is used to sort  the returned results. It starts with the keyword ORDER BY, followed by an identified path and the keyword DESC, DESCENDING, ASCE, or ASCENDING, e.g.

No Format
ORDER BY c/name/value

...

The first example below constrains the query source to data committed to the system before 2006-01-01.

No Format
TIMEWINDOW /2006-01-01

The second example constrains the query source to data committed within the period of two years before 2006-01-01.

No Format
TIMEWINDOW P2Y/2006-01-01

...

A FROM clause is to scope the data source for the query. Normally the first step is to shape FROM clause, which has EHR class expression and archetype class expressions.

  • EHR class expression. You need to determine whether this query is applied to a single EHR or all EHRs. The latter is called population queries. If it is for all EHRs, you don't need to specify ehr_id/value in FROM clause. Otherwise, you need to either specify a specific value or use a query parameter for ehr_id/value in FROM clause. For this particular scenario, it is for a single EHR. Parameter called ehrUid is used for the ehr_id/value, I would have this AQL query statement:

    No Format
    FROM EHR [ehr_id/value=$ehrUid]
    
  • Archetype expressions.
    • To write archetype expressions in the FROM clause, we need to determine what archetypes are required in the scenario. Since archetypes are used to represent clinical concepts. The easy way to identify archetypes from a scenario is to identify clinical concepts mentioned in the scenario. Each clinical concept is generally associated with one archetype. Two clinical concepts are mentioned in the aforementioned scenario: blood pressure and health encounter. These concepts are used to identify two archetypes used in query: an observation archetype with archetype id - openEHR-EHR-OBSERVATION.blood_pressure.v1, and a composition archetype with archetype id - openEHR-EHR-COMPOSITION.encounter.v1.
    • Determine whether we need a variable name for each archetype class expression or not. We may need one if the reference to the archetype class expression is required by other AQL clauses. At the moment, I don't put any variable names for the archetype expressions. The two archetype expressions are:

      No Format
      COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
      OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
      
  • Containment expression. openEHR RM is used to identify the hierarchical relationships among the found archetypes so as to define the containment expression. For example, a composition archetype is the parent of an observation archetype, so the AQL query would look like:

    No Format
    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    

...

  1. Find the criteria. The criteria required by the scenario are abnormal blood pressure values. Different guidelines may have different definitions for abnormal blood pressure values. Here we interpret that abnormal blood pressure means 1) the systolic pressure value is greater than or equal to 140; OR 2) diastolic pressure value is greater than or equal to 90.
  2. Write an identified expression for criterion 1).
    1. Write the Identified pathfor systolic data value. We need a class variable name as the reference to the blood pressure class expression defined in the FROM clause, so a variable name - obs - is added into the FROM clause. A path to systolic data value is also required to form the identified path. Two tools can be used to get the path: openEHR ADL WORKBENCHor Ocean Template Designer. The path to the systolic data value is /data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value.  
    2. Get appropriate operator: '>=' is chosen for criterion 1.
    3. Get the criteria value - 140. The query statement including the identified expression for criterion 1) is shown below:

      No Format
      FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
      CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
      WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
      
  3. Write an identified expression for criterion 2). Following the previous instruction to write the identified expression for criterion 2) which is shown as:

    No Format
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90
    
  4. Find the appropriate boolean operator to join criterion 1) and 2) - OR. The query statement looks like:

    No Format
    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 OR
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90
    

...

  1. Find what data is required. The aforementioned scenario requires abnormal blood pressure values are returned, including both systolic pressure and diastolic pressure.
  2. Write the appropriate identified path to the required item (variable name is required if the class object is required). Multiple identified path is separated using comma. For this particular scenario, two identified paths are found for systolic and diastolic data value by following Step 2. The completed query statement looks like:

    No Format
    SELECT
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude,
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude
    
    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    
    WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude>= 140 OR
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude>= 90
    

...

The example below shows a scenario which requires that a composition report (c1) is issued in the last year. It illustrates that a type of ISO 8601 date string (current-date()) subtracts a type of ISO 8601 interval string (P1Y).

No Format
c1/context/other_context/items[at0006]/items[at0013]/value > current-date() - PIY

...

The existing AQL grammar and syntax maybe further enhanced in the following areas:

  • Reduce the length of the query statement. The use of archetype path in AQL query makes the query lengthy and hard to read. Local variables with meaningful names that are assigned with path can be used to reduce the length of the query as well as improve readability of the query. One example is shown below:

    Code Block
    
    let $systolic_bp="data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude"
    let $diastolic_bp="data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude"
    
    
    SELECT obs/$systolic_bp, obs/$diastolic_bp
    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] 
         CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    WHERE obs/$systolic_bp>= 140 OR obs/$diastolic_bp>=90
    
  • To be continued