Archetype Query Language Description

STATUS: this page is historical; its contents are now maintained as the official AQL specification.


Overview

Developer

Ocean Informatics Team: Chunlan Ma MD, Heath Frankel, Thomas Beale

Existing Query Languages

Currently, the available query languages, such as SQL, XQuery, or Object-Oriented Query Language, have dependencies on particular system data structure and working environment. Users must know the persistence data structure of an EHR in order to write an appropriate query. The query statement cannot be used by other systems which have different data store. Consequently, none of the these languages meet the aforementioned requirements and none of these can be used directly as the query language required by integrated care EHRs

What is AQL?

Archetype Query Language (AQL) is a declarative query language developed specifically for expressing queries used for searching and retrieving the clinical data found in archetype-based EHRs. It is applied to the openEHR EHR Reference Model (RM) and the openEHR clinical archetypes, but the syntax is independent of applications, programming languages, system environment, and storage models. The minimum requirement for data to be querying with AQL (including with archetype structures and terminology) is for the data to be marked at a fine granularity with the appropriate archetype codes and terminology codes. This may be native openEHR-structured data, or legacy system data to which the relevant data markers (mainly archetype paths and terminology codes) have been added. Unlike other query languages, such as SQL or XQuery, AQL expresses the queries at the archetype level, i.e. semantic level, other than at the data instance level. This is the key in achieving sharing queries across system boundaries or enterprise boundaries.

AQL has the following distinctive features:

  1. the utilization of openEHR archetype path syntax in AQL. openEHR path syntax is used to locate clinical statements and data values within them using Archetypes. This path syntax is used to represent the query criteria and returned results. It allows setting query criteria using archetype and node identifiers, data values within the archetypes, and class attributes defined within the openEHR RM. It also allows the returned results to be top-level archetyped RM objects, data items within the archetypes or RM attribute values.
  2. the utilization of containment mechanisms to indicate the data hierarchy and constrain the source data to which the query is applied.
  3. the utilization of ADL-like operator syntaxes, such as matches, exists, in, negation. 
  4. Neutral expression syntax. AQL does not have any dependencies on the underlying RM of the archetypes. It is neutral to system implementation and environment.
  5. Supporting queries with logical time-based data rollback.

AQL has some other features which can be found from other query languages:

  1. Supporting naming returned results.
  2. Supporting query criteria parameters.
  3. Supporting arithmetic operations (such as count, addition, subtraction, multiplication, and division), relational operations (>, >=, =, !=, <=, <) and Boolean operations (or, and, xor, not).
  4. Supporting some functions that are supported in XQuery, such as current-date().
  5. Users can specify their preference on the retrieved data, such as ordering preferences, or total number of retrieved results.
  6. Supporting queries for individual clinical subjects at the point of care, administrative purposes and clinical research purposes.

Structure & Example

Like SQL, AQL has five clauses: SELECT, FROM, WHERE, ORDER BY, and TIMEWINDOW. The SELECT clause specifies the data elements to be returned. The FROM clause specifies the result source and the corresponding containment criteria. The WHERE clause specifies data value criteria within the result source. The ORDER BY clause indicates the data items used to order the returned result set. TIMEWINDOW clause is to constrain the query to data that was available in the system within the specified time criteria.

Here is an example of AQL statement. This statement returns all blood pressure values where systolic value is greater or equal to 140 or diastolic value is greater or equals to 90 within a specified EHR. AQL syntax is a synthesis of SQL structural syntax and openEHR path syntax.

AQL Syntax Description

Overview

AQL has SELECT, FROM, WHERE and ORDER BY  clauses. openEHR path syntax is used to locate any node or data values within archetypes. The SELECT clause uses openEHR path syntax to indicate expected archetypes, elements, or data values that are need to be returned. FROM clause uses containment mechanisms indicating the scope of the query data source. WHERE clause utilises archetype path to set query criteria on archetypes or any node within the archetypes. The following sections introduce the AQL syntax in detail.

Reserved words and characters

 AQL has the following reserved keywords and characters:

  • select, from, where, order by, top, and, or
  • contains, max, now, current-date, current-date-time
  • timewindow (need to be confirmed)
  • matches, in, exists, not in
  • '"' and ''': double quote characters and single quote characters are used to delimit string values;
  • '|': bar characters are used to delimit intervals;
  • []: brackets are used to delimit coded terms, archetype id values or openEHR reference model class attribute values.
  • {}: brackets are used with MATCHES delimiting matches criteria
  • <, >, =, %, !, $

openEHR path syntax

The openEHR path syntax has been described in a detail (see the overview specification). AQL utilises the basic openEHR path syntax to represent both coarse-grained and fine-grained archetype nodes and openEHR class instances and attributes. The syntax details are not repeated here. A set of openEHR path examples are provided.

openEHR path examples - archetype path

Archetype path is the path referring to any nodes within an archetype. The picture below is an ADL workbench screen shot of apgar observation node map. The table below shows some examples.

Archetype RM type

Archetype node name

Path syntax

OBSERVATION

Apgar score

/

OBSERVATION

1 minute

/data[at0002]/events[at0003]

OBSERVATION

Heart rate

/data[at0002]/events[at0003]/data[at0001]/items[at0005]

openEHR path examples - path to RM class attributes

Another type of openEHR path is the path pointing to an attribute of openEHR Reference Model classes. The examples shown below uses the attributes from COMPOSITION class.

OpenEHR RM attribute name

Path syntax

COMPOSITION.category

/category

COMPOSITION.context.start_time

/context/start_time

COMPOSITION.uid.value

/uid/value

AQL variables

AQL variables are used to refer a specific archetype or openEHR class defined within an AQL statement. It has following featueres:

  • AQL variable names must be defined in an AQL FROM clause.
  • A variable name must be unique within an AQL statement.
  • Not all classes or archetypes defined within an AQL statement need a variable name. It is required only when other clauses within the AQL statement need to refer to the class or archetype defined in the FROM clause. For instance, the example doesn't define a variable name for EHR class. It has a variable name - c - indicating the composition archetype with 'openEHR-EHR-COMPOSITION.encounter.v1' as the archetype id, and varialbe name - o - referring to the observation archetype with archetype id 'openEHR-EHR-OBSERVATION.blood_pressure.v1'. Variable name o is used by both WHERE clause and SELECT clause.
  • Variable names are used by other clauses within the AQL statement along with openEHR path to form a complete AQL path - Identified path. Refer back to Section 2.2. example, o/data[at0001/events[at0006]/data[at0003]/items[at0004]/value/value] points to the systolic value node of the observation with 'openEHR-EHR-OBSERVATION.blood_pressure.v1'.
  • Except for the reserved words and characters, and space, any letters, Alphanumeric, or underscore can be used for a variable name. It is not case sensitive. Digit numbers are not recommenced to be used at the beginning of a variable name even though no restriction on this at the current stage.

Parameters

AQL syntax supports parameterisation. AQL parameters are used to indicate criteria, which will be substituted with values at run time. Supporting parameterisation is one of the mechanisms used to realise sharing AQL statements within a system or across system boundaries.

The parameters can be EHR specific (such as the parameter of EHR identifier or EHR creation date time), archetype specific (such as an archetype identifier, or archetype constraints), or template specific (such as a template identifier or template constraints).

When parameters are needed

Parameters are needed when the same AQL query statement is used with different criteria values. This AQL example is to return all abnormal blood pressure values for a single specific EHR. This query has a parameter $ehrUid. This parameter will be substituted by a specific EHR.ehr_id value at run time. Consequently, this query can be reused for all EHRs either within an EHR system or by other EHR systems (assuming these EHR systems support AQL).

Where and how parameters are resolved

AQL query parameters can be resolved at application level, or EHR system level. It depends on what the query parameters are used for and the design/implementation of the system or components. Query parameters would be normally resolved outside of a query engine. 

If a query needs to be reusable across different EHR systems, any query parameters normally need to be registered in these EHR systems so that they can be resolved with real values from each environment. A typical example of this type of query parameter is $ehrUid. If the query is only used within an application, then the query parameters would be resolved by the application, such as the parameter of healthcare facility identifier or template identifier. 

There are no specific guidelines on how to resolve query parameters. Generally speaking, a parameter name is used as a key (or a key is associated with a parameter name) and the key needs to be unique within the boundary where the parameters are resolved. The EHR system or application needs to have the API functions to get the real value with a given parameter name or key.

Parameter syntax 

A parameter always starts with a dollar sign $ followed by the parameter name, e.g. $ehrUid. The parameter name can consist of letters, digits and underscores _. It cannot have spaces and it cannot be an AQL reserved word.

A parameter can be used for any criteria values within an AQL statement, e.g.:

  • within a predicate

    [$archetypeId], [at0003, $nameValue], [ehr_id/value=$ehrId]
    
  • outside a predicate

    o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value > $systolicCriteria
    

Predicates

AQL has three types of predicates: the standard predicate, the archetype predicate, and the node predicate.

Standard predicate

Standard predicates always have a left operand, operator and right operand, e.g. [ehr_id/value='123456'].

  • The left operand is normally an openEHR path, such as ehr_id/value, name/value.
  • The right operand is normally a criterion value or a parameter, such as '123456', $ehrUid. It can also be an openEHR path (based on the BNF), but we do not have an example of this situation yet.
  • The operator can be one of the following:

    >, >=, =, <, <=, !=
    

Archetype predicate

An archetype predicate is a shortcut of a standard predicate, i.e. the predicate does not have the left operand and operator. It only has an archetype id, e.g. [openEHR-EHR-COMPOSITION.encounter.v1]. The archetype predicate is a specific type of query criterion indicating what archetype instances are relevant to this query. It is used to scope the the data source from which the query expected data is to be retrieved. Therefore, an archetype predicate is only used within an AQL FROM clause, for example,

FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS
OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1]

Node predicate

A node predicate is also a shortcut of a standard predicate. It has the following forms:

  • Containing an archetype node id (known as atcode) only, e.g.:

    [at0002]
    
  • Containing an archetype node id and a name value/criterion, e.g.:

    [at0002 and name/value=$nameValue]
    [at0002 and name/value='real name value']
    
  • Containing an archetype node id and a shortcut of a name/value criterion, e.g.:

    [at0002, $nameValue]
    [at0002, 'real name value']
    
  • The above three forms are the most common node predicates. A more advanced form is to include a general criterion instead of the name/value criterion within the predicate. The general criterion consists of left operand, operator, and right operand, e.g.:

    [at0002 and value/defining_code/terminology_id/value=$terminologyId]
    

A node predicate defines criteria on fine-grained data. It is only used within an identified path.

Operators

Basic operators

The table below shows the basic AQL operators, meaning and example.

Operator

Meaning

Example

=

Equal

name/value = $nameValue

>

Greater than

o/data[at0001]/.../data[at0003]/items[at0004]/value/value >140

>=

Greater than or equal to

o/data[at0001]/..../data[at0003]/items[at0004]/value/value >=140

<

Smaller than

o/data[at0001]/.../data[at0003]/items[at0004]/value/value <160

<=

Smaller than or equal to

o/data[at0001]/.../data[at0003]/items[at0004]/value/value <=160

!=

not equals to

c/archetype_details/template_id/value != ''

Advanced operators

matches

matches operator is used in WHERE clause. This operator needs left operand and right operand. Left operand is an AQL identified path. Right operand is enclosed within curly braces. Right operand of matches operator has three forms:

  1. type of constraint ADL (cADL): this type of right operand can be a value list or a value range. Value range is not supported in the current AQL grammar. Below is an example with matches a string value list:

    SELECT o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/name
    FROM Ehr [uid=$ehrUid] CONTAINS Composition c CONTAINS Observation o[openEHR-EHR-OBSERVATION.microbiology.v1]
    WHERE o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/items[at0019]/items[at0021]/name/defining_code/code_string matches {'18919-1', '18961-3', '19000-9'}
    
  2. URI: can be either terminology URI or EHR URI, or some other URIS. Only terminology URI is supported in the current AQL grammar. An example with matches a terminology URI is shown below:

    SELECT e/ehr_status/subject/external_ref/id/value, diagnosis/data/items[at0002.1]/value
    FROM EHR e CONTAINS Composition c[openEHR-EHR-COMPOSITION.problem_list.v1] CONTAINS Evaluation diagnosis[openEHR-EHR-EVALUATION.problem-diagnosis.v1]
    WHERE c/name/value='Current Problems' AND diagnosis/data/items[at0002.1]/value/defining_code matches { terminology://Snomed-CT/hierarchy?rootConceptId=50043002 }
    
  3. embedded AQL query: means an AQL query statement, i.e. embedded query, within the curly braces. Embedded query is not supported in the current AQL grammar. It looks like:

    matches {SELECT ... FROM .. WHERE}
    
EXISTS

EXISTS is a unary operator, which is always followed by a single operand. The single operand is an AQL identified path. This operator return boolean results: true means the data associated with the specified path exists, false otherwise. EXISTS expression is a single identified expression. The latest grammar only allows this operator in WHERE clause.It may be required as part of openEHR path in future.

The example below means retrieving the identifiers of the EHRs which do not have discharge admin entry instance.

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1]

This example below means from a particular EHR, retrieving all composition uid values and the contained instruction instances if the instruction instances have links with particular EHR_URI value.

SELECT c/uid/value, instruction
FROM EHR e [ehr_id/value=$ehrid] CONTAINS COMPOSITION c
     CONTAINS INSTRUCTION instruction[openEHR-EHR-INSTRUCTION.referral.v1]
WHERE EXISTS instruction/links[target='ehr://327000002/87284370-2D4B-4e3d-A3F3-F303D2F4F34B@latest_trunk_version']
NOT

NOT is a unary boolean operator and it is always followed by either a single identified expression or boolean identified expression. It returns boolean results: true means the followed expression (single or boolean expression) is false.

The example below uses NOT operator followed by a single identified expression

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1]

The example below uses NOT operator followed by a boolean identified expression.

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT (EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] AND
           e/ehr_status/subject/external_ref/namespace='CEC')

The above example is equivallent to the two expressions shown below:

SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR
      NOT e/ehr_status/subject/external_ref/namespace='CEC'


SELECT e/ehr_id/value
FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1]
     CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1]
WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR
      e/ehr_status/subject/external_ref/namespace != 'CEC'

These advanced operators are not yet supported by the grammar. The operator syntax is borrowed from ADL specifications. These are proposed to improve the richness and flexibility of AQL so that AQL syntax supports more complicated query scenarios.

AQL identified paths

Usage

Identified paths are used to locate data items within an archetype RM class. Except for FROM clause, an identified path could appears in any clauses of an AQL statement. For example, it can be used to indicate the data to be returned in a SELECT clause, or the data item on which query criteria are applied in a WHERE clause.

Syntax

AQL identified path has the following forms:

  • consisting an AQL variable name defined within the FROM clause, followed by an openEHR path, e.g.

    o/data[at0001]/.../data[at0003]/items[at0004]/value/value
    
  • consisting an AQL variable name followed by a predicate, e.g.

    o[name/value=$nameValue]
    
  • consisting an AQL variable name followed by a predicate and an openEHR path, e.g.

    o[name/value=$nameValue]/data[at0001]/.../data[at0003]/items[at0004]/value/value
    

Leaf Data

Primitive Types

String Data

All strings are enclosed in double quotes or single quotes. It doesn't support line breaks.

TODO list: 

  • allows double quotes and single quotes in the string.
Integer data

 Integers are represented as numbers, such as 1, 2, 365. Commas or periods for breaking long numbers are not allowed. Hex integer is not supported.

Real data

 Real numbers are the numbers with a decimal, such as 3.1415926. Commas or periods for breaking long numbers are not allowed. Hexadecimal is not supported.

Boolean data

 Boolean values can be indicated as true or false. It is case-insensitive.

Dates and Times

No matter the date/date time is complete or partial, it is allowed in AQL as long as it is in ISO8601 date/date time format. Quotes (either single quotes or double quotes) are required to enclose the date/date time value. In AQL grammar, it is treated as a date value if the value is a complete calendar date represented in ISO8601 extended format and if single quotes are used. If the value is in other ISO8601 date/date time format, AQL treats it as a string value, but the AQL query engine processes it differently from a normal string value. It is a temporary solution. In future, a date/date time value should be identified as date/date time value if it is a valid ISO8601 date/date time. Examples are as below:

WHERE composition/context/start_time > '19860101' // a complete date. AQL grammar identifies it as a string value.
WHERE composition/context/start_time < "120000+0930"  // ISO8601 time. AQL grammar identifies it as a string value.
WHERE composition/context/start_time <= '1986-01-01'  // complete calendar date in ISO8601 extended format. AQL grammar identifies this value as a date value.

Other Built-in Types 

URI

URI leaf data is enclosed within curly braces after "matches" operator. URI can be expressed as a normal URI described in  http://www.ietf.org/rfc/rfc3986.txt. URI is not case sensitive. In AQL, a URI can be either terminology URI or EHR URI.

Terminology URI 

A terminology URI consists of the following components:

  1. terminology: the URI schemes value;
  2. terminology service: the URI authority value, such as SNOMED-CT;
  3. terminology function name: the URI path, e.g. "hierarchy" is the function name in the example shown below;
  4. argument values required by the terminology functions: URI queries;

This is an example of a terminology URI:

EHR URI

to be continued... 

Value List

Value list leaf data is enclosed within curly braces after matches operator. Each item is separated by comma if there are multiple items in the list. Value list can be string value list, date time value list, integer value list, and real value list. Quotes are required for string value list and date time value list.

A value list is only used in AQL WHERE clause when the criteria is to match one item of the list. The relationships among these value list items are "OR".

Some examples of value list:

matches {'string item 1', 'string item 2', 'string item3'}  // a string list. It is equivalent to matches 'string item1' or .. matches 'string item 2' or .. matches 'string item 3'
matches {'20060101', '20070101', '20080101'}            // a date value list
matches {1, 2, 3}                                                      // a integer list
matches {1.1, 2.5, 3.8}                                             // a real value list
Value Range

It is not supported in the current AQL grammar.

Value range leaf data is enclosed within curly braces after matches operator. It is only used in AQL WHERE clause to specify a criteria within a range. The syntax is similar as intervals of ordered primitive types detailed in Section 4.5.2, adl.pdf. Some examples:

matches {|3.0..5.0|} // means 3.0 <= x <=5.0
matches {|<8|}       // means x<8

Query structure

Overview

AQL structure has been briefly introduced in Section 2.2. This section describes the syntax in more formal detail.

AQL structure has the following clauses and these clauses must be listed in the same order as the list below in an AQL statement. An AQL statement must contain SELECT and FROM clauses. 

  • SELECT (mandatory)
  • FROM (mandatory)
  • WHERE (optional)
  • ORDER BY (optional)
  • TIMEWINDOW (optional)

FROM

The FROM clause utilises class expressions and a set of containment criteria to specify the data source from which the query required data is to be retrieved.  Its function is similar as the FROM clause of an SQL expression.

FROM Syntax

 A simple FROM clause consists of three parts: keyword - FROM, class expression and/or containment constraints, e.g.

FROM EHR e [ehr_id/value=$ehrId] CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]
Class expressions

Two examples of a class expression are shown below:

EHR e [ehr_id/value=$ehrId]                                          \\EHR class, class identifier/variable, and a standard predicate
COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]   \\COMPOSITION class, class identifier/variable, and an archetype predicate


Class expressions are used for two purposes:

  1. indicating the constraints on RM classes so as to scope the data source for the query. For instance, EHR e[ehr_id/value='123456'] indicates that the required data must be from a specific EHR with ehr_id value '123456'; while "COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1] " indicates the required data must be from or must be associated with a composition instance with archetype id - openEHR-EHR-COMPOSITION.report.v1.
  2. defining a RM class variable that may be used by other clauses to indicate the required data or data items on which query criteria are applied. The example below uses the class expression to define a variable e which is used by the SELECT clause indicating all relevant ehr_id values are retrieved, and a variable c used by WHERE clause indicating that the query criteria is set on the composition template id value.

    SELECT e/ehr_id/value
    FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.referral.v1]
    WHERE c/archetype_details/template_id/value=$templateId
    

Class expressions syntax include three parts. A class expression must have part one and at least one of part two or part three.

  1. part one (mandatory): openEHR RM class name, such as EHR, COMPOSITION, OBSERVATION etc.
  2. part two (optional): AQL variable name
  3. part three (optional): a standard predicate or an archetype predicate.
Containment

Since archetypes are in hierarchical structure, AQL has a containment constraint which specifies the hierarchical relationships between parent and child data items. FROM clause utilises this hierarchical constraint along with class expression to determine the data source to which the AQL query is applied.

The syntax of containment constraint is very simple: using keyword CONTAINS between two class expressions. Left class expression is the the parent object of the right class expression, e.g.

EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1]


Boolean operators (AND, OR, NOT) and parentheses are used when multiple containment constrains are required, e.g.

EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] AND
COMPOSITION c1 [openEHR-EHR-COMPOSITION.report.v1]
EHRR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] CONTAINS
(OBSERVATION o openEHR-EHR-OBSERVATION-laboratory-hba1c.v1 AND
OBSERVATION o1 openEHR-EHR-OBSERVATION-laboratory-glucose.v1)

WHERE

An AQL WHERE clause is used to represent further criteria applied to the data items within the objects declared in the FROM clause. A WHERE clause expresses the query criteria that cannot be represented in other AQL clauses, such as criteria on archetype id, composition committal date time, and the criteria on in which order the returned results should be listed.

Syntax

The WHERE clause syntax has the following parts (in order): keyword WHERE and identified expression(s). Boolean operator (AND, OR, NOT) and parenthesis can be used to represent multiple identified expressions. Examples:

WHERE c/name/value=$nameValue AND c/archetype_details/template_id/value=$templateId
WHERE (c/name/value=$nameValue OR c/archetype_details/template_id/value=$templateId) AND
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
Identified expression

Identified expression specifies the criteria within WHERE clause. It consists of left operand, operator and right operand.

  1. Left operand is normally an identified path,
  2. Operator is normally the basic operator,
  3. Right operand is the criteria value, which normally is primitive type, such as string, integer, boolean, double, or float. When it is a string value, single quotation marks or double quotation marks are required. Right operand can also be a parameter or an identified path as well.

Examples:

  • Left operand is an identified path and right operand is a real criteria value - primitive type:

    o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
    c/archetype_details/template_id/value='health_encounter'
    
  • Left operand is an identified path and right operand is a parameter:

    c/archetype_details/template_id/value=$templateParameter
    
  • Both left operand and right operand are an identified path (this is an advanced feature):

    o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >
    o1/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value
    

SELECT

A SELECT clause specifies what data is to be retrieved by the AQL query. The data can be any types from openEHR RM and any primitive data types. In addition, SELECT clause uses TOP to indicate the number of result setS that should be returned and name alias to rename the retrieved data.

The SELECT syntax always starts with keyword SELECT, followed by TOP (optional), and identified path(s) or variable name(s) defined in the FROM clause. Each variable name or an identified path may have a name alias renaming the associated data.

If variable name(s) is(are) required, the full object of the type associated with the variable is retrieved, such as a COMPOSITION, an OBSERVATION object etc. If identified paths are required, the path-associated data items are returned. Multiple identifiers or identified paths are separated using a comma.

Some examples are shown below.

Example 1: retrieve all compositions' name value, context start time and composer name from a specific EHR.

SELECT c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer
 FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c

Example 2: Retrieve all composition objects of a specific EHR.

SELECT c
FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c
TOP

TOP syntax was borrowed from SQL language for representing the number of result sets that should be returned by the AQL query. It uses BACKWARD and FORWARD to indicate the direction where to start to get the number of results to be returned.

It starts with keyword TOP, followed by an integer number and/or the direction (i.e. BACKWARD, FORWARD), e.g.

SELECT TOP 10 c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer
FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c
Name alias

Like SQL, AQL supports using name alias to rename the retrieved data. It starts with keyword - AS, followed by the name which conforms to the syntax rule of AQL variable. 

ORDER BY

The ORDER BY clause is used to sort  the returned results. It starts with the keyword ORDER BY, followed by an identified path and the keyword DESC, DESCENDING, ASCE, or ASCENDING, e.g.

ORDER BY c/name/value

TIMEWINDOW 

In addition to FROM clause, TIMEWINDOW helps to scope the data source from which the required data is retrieved. TIMEWINDOW is an addition query clause used in AQL to constrain the query to data that was available in the system (also know the data committal time) within the specified time criteria. This supports a timebased logical system rollback allowing a query to be executed as though it was performed at that specified time, which is essential for medico-legal reporting.

It starts with the keyword - TIMEWINDOW, and followed by a string compatible with the ISO 8601 representation of time interval.

The first example below constrains the query source to data committed to the system before 2006-01-01.

TIMEWINDOW /2006-01-01

The second example constrains the query source to data committed within the period of two years before 2006-01-01.

TIMEWINDOW P2Y/2006-01-01

Result structure

The result structure of an AQL query is described here as part of the EHR service definition.

How to write an AQL query statement manually

Ocean Informatics has implemented an AQL query builder which can be used to generate AQL query statements based on archetypes automatically. We strongly recommend to use this tool to generate an AQL query statement because it can be very complicated. This section is for the people who either do not have the access to AQL query builder or for users who want to have more understanding on AQL so that they can edit an existing AQL query manually.

Query scenario 

An example is used to illustrate how to write an AQL statement. The query scenario is:

"Get all abnormal blood pressure values that are recorded in a health encounter for a specific patient."

Step 1. Write FROM clause

A FROM clause is to scope the data source for the query. Normally the first step is to shape FROM clause, which has EHR class expression and archetype class expressions.

  • EHR class expression. You need to determine whether this query is applied to a single EHR or all EHRs. The latter is called population queries. If it is for all EHRs, you don't need to specify ehr_id/value in FROM clause. Otherwise, you need to either specify a specific value or use a query parameter for ehr_id/value in FROM clause. For this particular scenario, it is for a single EHR. Parameter called ehrUid is used for the ehr_id/value, I would have this AQL query statement:

    FROM EHR [ehr_id/value=$ehrUid]
    
  • Archetype expressions.
    • To write archetype expressions in the FROM clause, we need to determine what archetypes are required in the scenario. Since archetypes are used to represent clinical concepts. The easy way to identify archetypes from a scenario is to identify clinical concepts mentioned in the scenario. Each clinical concept is generally associated with one archetype. Two clinical concepts are mentioned in the aforementioned scenario: blood pressure and health encounter. These concepts are used to identify two archetypes used in query: an observation archetype with archetype id - openEHR-EHR-OBSERVATION.blood_pressure.v1, and a composition archetype with archetype id - openEHR-EHR-COMPOSITION.encounter.v1.
    • Determine whether we need a variable name for each archetype class expression or not. We may need one if the reference to the archetype class expression is required by other AQL clauses. At the moment, I don't put any variable names for the archetype expressions. The two archetype expressions are:

      COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
      OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
      
  • Containment expression. openEHR RM is used to identify the hierarchical relationships among the found archetypes so as to define the containment expression. For example, a composition archetype is the parent of an observation archetype, so the AQL query would look like:

    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    

Step 2. Write WHERE clause

WHERE clause represents all query criteria that cannot be represented in the FROM clause. To write WHERE clause expression, you need to:

  1. Find the criteria. The criteria required by the scenario are abnormal blood pressure values. Different guidelines may have different definitions for abnormal blood pressure values. Here we interpret that abnormal blood pressure means 1) the systolic pressure value is greater than or equal to 140; OR 2) diastolic pressure value is greater than or equal to 90.
  2. Write an identified expression for criterion 1).
    1. Write the Identified pathfor systolic data value. We need a class variable name as the reference to the blood pressure class expression defined in the FROM clause, so a variable name - obs - is added into the FROM clause. A path to systolic data value is also required to form the identified path. Two tools can be used to get the path: openEHR ADL WORKBENCHor Ocean Template Designer. The path to the systolic data value is /data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value.  
    2. Get appropriate operator: '>=' is chosen for criterion 1.
    3. Get the criteria value - 140. The query statement including the identified expression for criterion 1) is shown below:

      FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
      CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
      WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
      
  3. Write an identified expression for criterion 2). Following the previous instruction to write the identified expression for criterion 2) which is shown as:

    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90
    
  4. Find the appropriate boolean operator to join criterion 1) and 2) - OR. The query statement looks like:

    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 OR
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90
    

Step 3. Write SELECT clause

Writing the SELECT clause needs to:

  1. Find what data is required. The aforementioned scenario requires abnormal blood pressure values are returned, including both systolic pressure and diastolic pressure.
  2. Write the appropriate identified path to the required item (variable name is required if the class object is required). Multiple identified path is separated using comma. For this particular scenario, two identified paths are found for systolic and diastolic data value by following Step 2. The completed query statement looks like:

    SELECT
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude,
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude
    
    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    
    WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude>= 140 OR
    obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude>= 90
    

Current status

AQL features not yet supported in the Ocean Informatics AQL Parser

This is the list of all features that are not yet supported in the AQL Parser implemented by Ocean Informatics (Nov 2012):

  • TOP
  • XOR. Currently AND, OR, NOT, EXISTS are supported in WHERE clause.
  • All functions are not supported.
  • Embedded query is not supported
  • ORDER BY
  • TIMEWINDOW

AQL features that are proposed, but not finalised

Arithmetic functions

 It has been proposed that AQL would support some basic arithmetic functions, such as addition (plus) , subtraction (minus) . Multiplication, and division may be supported as well. We do not have any scenario requiring for these two functions yet.

Most cases, it is required that the left operand and the right operand must be the same type in order to do the calculation, e.g. numeric types. openEHR RM supports addition and subtraction for the types of: DV_QUANTITY, DV_COUNT, DV_INTERVAL, DV_PROPORTION, DV_DATE_TIME, DV_DATE, and DV_TIME. Again, it is required that the left and right operands must be the same type. However, one exception to this is that openEHR RM also supports subtraction between all subtypes of DV_TEMPORAL and type of DV_DURATION. AQL would support these arithmetic functions that openEHR RM supports.

The example below shows a scenario which requires that a composition report (c1) is issued in the last year. It illustrates that a type of ISO 8601 date string (current-date()) subtracts a type of ISO 8601 interval string (P1Y).

c1/context/other_context/items[at0006]/items[at0013]/value > current-date() - PIY

Other functions

It is proposed that AQL may also support other functions, such as:

  • current-date(): a build-in function returning the current date value in ISO date string format. 
  • current-date-time(): a build-in function returning the current date time value in ISO date/time string format. 
  • max: a build-in function returning the max value out of an expression.

Further discussions

Ocean products have been using AQL to search and retrieve openEHR-conformed EHR data for couple of years. The major usages of AQL are:

  • Retrieving candidate compositions with given criteria on EHR identifier and composition meta data, such as composition type (i.e. archetype id), healthcare facility, event start time, composer and so on.
  • Retrieving EHR data from a single EHR for dedicated purposes, such as risk factor calculations 
  • Reporting, in which case AQL is used to retrieve EHR data that are cross multiple EHRs, i.e. population AQL query. 
  • to be continued...

The existing AQL grammar and syntax maybe further enhanced in the following areas:

  • Reduce the length of the query statement. The use of archetype path in AQL query makes the query lengthy and hard to read. Local variables with meaningful names that are assigned with path can be used to reduce the length of the query as well as improve readability of the query. One example is shown below:

    let $systolic_bp="data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude"
    let $diastolic_bp="data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude"
    
    
    SELECT obs/$systolic_bp, obs/$diastolic_bp
    FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] 
         CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
    WHERE obs/$systolic_bp>= 140 OR obs/$diastolic_bp>=90
    
  • To be continued