Developer

Ocean Informatics.

Team: Chunlan Ma MD, Heath Frankel, Thomas Beale

Introduction

 Integrated Care EHR (Electronic Health Record) is defined as:
"a repository of information regarding the health of a suject of care in computer processable form, stored and transmitted securely, and accessible by multiple authorised users" [1].
 

To make EHR be accessible by multiple authorised users, it means the EHR data (both coarse-grained and fine-grained data) can be searched and retrieved by the authorised users who may use the same EHR systems or different EHR systems. It is not possible at the current stage. To achieve this, it is required that:

Currently, the available query languages, such as SQL, XQuery, or Object-Oriented Query Language, have dependencies on particular system data structure and working environment. The authorised users must know the persistence data structure of an EHR in order to write an appropriate query. The query statement cannot be used by other systems which have different data store. Consequently, none of the these language meets the aforementioned requirements and none of these can be used as the query language required by integrated care EHRs.

The key innovation of openEHR architecture is two-level modelling: model of information (openEHR Reference Model - known as openEHR RM) and models of content. Models of content includes archetypes (domain-specific model), templates (situation-specific model), and GUI (application-specific model). Archetypes models

Overview

What is AQL?

Archetype Query Language (AQL) is a declarative query language developed specifically for expressing the queries used for searching and retrieving the clinical data found in archetype-based EHRs. It is applied to the openEHR EHR Reference Model (RM) and the openEHR clinical archetypes, but the syntax is generic across applications, programming languages, system environment, and reference model. AQL is designed as a common language used for expressing clinical data requests across multiple openEHR-based applications.
Unlike other query languages, such as SQL or XQuery, AQL expresses the queries at the archetype level, i.e. semantic level, other than at the data instance level. This is the key in achieving sharing queries across system boundaries or enterprise boundaries.

AQL has the following distinctive features:

  1. the utilization of openEHR path syntax in AQL. openEHR path syntax is used to locate clinical statements and data values within them using Archetypes. This path syntax is used to represent the query criteria and returned results. It  allows setting query criteria using archetype and node identifiers, data values within the archetypes, and class attributes defined within the openEHR RM. It also allows the returned results to be top-level archetyped RM objects, data items within the archetypes or RM attribute values.
  2. the utilization of containment mechanisms to indicate the data hierarchy and constrain the source data to which the query is applied.
  3. the utilization of ADL-like operator syntaxes, such as matches, exists, in, negation. 
  4. Neutral expression syntax. AQL does not have any dependencies on the underlying RM of the archetypes. It is neutral to system implementation and environment.
  5. Supporting queries with logical time-based data rollback.

AQL has some other features which can be found from other query languages:

  1. Supporting naming returned results.
  2. Supporting query criteria parameters.
  3. Supporting arithmetic operations (such as count, addition, subtraction, multiplication, and division), relational operations (>, >=, =, !=, <=, <) and Boolean operations (or, and, xor, not).
  4. Supporting some functions that are supported in XQuery, such as current-date().
  5. Users could specify their preference on the retrieved data, such as ordering preferences, or total number of retrieved results.
  6. Supporting the queries for individual clinical subjects at the point of care, administrative purposes and clinical research purposes.

Structure & Example

Like SQL, AQL has five clauses: SELECT, FROM, WHERE, ORDER BY, and TIMEWINDOW. The SELECT clause specifies the data elements to be returned. The FROM clause specifies the result source and the corresponding containment criteria. The WHERE clause specifies data value criteria within the result source. The ORDER BY clause indicates the data items used to order the returned result set. TIMEWINDOW clause is to constrain the query to data that was available in the system within the specified time criteria.

Here is an example of AQL statement. This statement returns all blood pressure values where systolic value is greater or equal to 140 or diastolic value is greater or equals to 90 within a specified EHR. AQL syntax is a synthesis of SQL structural syntax and openEHR path syntax.

AQL Syntax

Overview

AQL has SELECT, FROM, WHERE and ORDER BY  clauses. openEHR path syntax is used to locate any node or data values within archetypes. The SELECT clause uses openEHR path syntax to indicate expected archetypes, elements, or data values that are need to be returned. FROM clause uses containment mechanisms indicating the scope of the query data source. WHERE clause utilises archetype path to set query criteria on archetypes or any node within the archetypes. The following sections introduces the AQL syntax in details.

Reserved words and characters

 AQL has the following reserved keywords and characters:

openEHR path syntax

 The openEHR path syntax has been introduced in a great detail by Thomas Beal et. al. (http://svn.openehr.org/specification/TRUNK/publishing/architecture/overview.pdf). AQL utilises the basic openEHR path syntax to represent both coarse-grained and fine-grained archetype nodes and openEHR class instances and attributes. The syntax details are not repeated. A set of openEHR path examples are provided.

openEHR path examples - archetype path

Archetype path is the path referring to any nodes within an archetype. The picture below is an ADL workbench screen shot of apgar observation node map. The table below shows some examples.

Archetype node name

Path syntax

Apgar score

/

1 minute

/data[at0002]/events[at0003]

Heart rate

/data[at0002]/events[at0003]/data[at0001]/items[at0005]


openEHR path examples - path to RM class attributes

Another type of openEHR path is the path pointing to an attribute of openEHR Reference Model classes. The examples shown below uses the attributes from COMPOSITION class.

OpenEHR RM attribute name

Path syntax

COMPOSITION.category

/category

COMPOSITION.context.start_time

/context/start_time

COMPOSITION.uid.value

/uid/value

AQL variables

AQL variables are used to refer a specific archetype or openEHR class defined within an AQL statement. It has following featueres:

Parameters

AQL syntax supports parameterization. AQL parameters are used to indicate a criteria, which will be substituted with real criteria value at run time. Supporting parameterization is one of the mechanism used to realise sharing AQL statements within a system or cross system boundaries.

When parameters are needed

Parameters are needed when same AQL query statement is used with different criteria values. This AQL exampleis to return all abnormal blood pressure values for a single specific EHR. This query has a parameter $ehrUid. This parameter will be substituted by a real EHR.ehr_id value at run time. Consequently, this query can be reused for all EHRs either within an EHR system or by other EHR systems (Assuming these EHR systems support AQL).

Parameter syntax 

A parameter always starts with a dollar sign $ followed by the parameter name, e.g. $ehrUid. Expect for AQL reserved words or characters, parameter name can be any letter(s), alphanumeric values or underscore _. It cannot have a space.

A parameter can be used for all criteria values within an AQL statement, e.g.

Predicate

AQL has three types of predicates: standard predicate, archetype predicate, and node predicate.

Standard predicate

Standard predicate always has left operand, operator and right operand, e.g. [ehr_id/value='123456'].

Archetype predicate

Archetype predicate is a shortcut of standard predicate, i.e. the predicate does not have left operand and operator. It only has an archetype id, e.g. [openEHR-EHR-COMPOSITION.encounter.v1]. Archetype predicate is a specific type of query criteria indicating what archetype instances are relevant to this query. It is used to scope the the data source from which the query expected data is to be retrieved. Therefore, an archetype predicate is only used within an AQL FROM clause, for example,

FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1]

Node predicate

Node predicate is also a shortcut of standard predicate. It has the following forms:

Node predicate defines criteria on fine-grained data. It is only used within an identified path.

AQL operators

Basic AQL operators

The table below shows the basic AQL operators, meaning and example.

Operator

Meaning

Example

=

Equal

name/value = $nameValue

>

Greater than

o/dataat0001/.../dataat0003/itemsat0004/value/value >140

>=

Greater than or equal to

o/dataat0001/..../dataat0003/itemsat0004/value/value >=140

<

Smaller than

o/dataat0001/.../dataat0003/itemsat0004/value/value <160

<=

Smaller than or equal to

o/dataat0001/.../dataat0003/itemsat0004/value/value <=160

!=

not equals to

c/archetype_details/template_id/value != ''

 

 

 

Advanced AQL operators

These advanced operators are not supported by the BNF. These operator syntax are borrowed from ADLspecifications. These are proposed to improve the richness and flexibility of AQL so that AQL syntax supports more complicated query scenarios. This linkshows these operators, scenarios and the syntax.

AQL identified paths

Usage

Identified paths are used to locate data items within an archetype RM class. Except for FROM clause, an identified path could appears in any clauses of an AQL statement. For example, it can be used to indicate the data to be returned in a SELECT clause, or the data item on which query criteria are applied in a WHERE clause.

Syntax

AQL identified path has the following forms:

Overview

AQL structure has been briefly introduced in Section 2.2. This section introduces AQL structure in details.

AQL structure has the following clauses and these clauses must be listed in the same order as the list below in an AQL statement. An AQL statement must contain SELECT and FROM clauses. 

FROM

The FROM clause utilises class expressions and a set of containment criteria to specify the data source from which the query required data is to be retrieved.  Its function is similar as the FROM clause of an SQL expression.

FROM Syntax

 A FROM clause consists of three parts: keyword - FROM, class expression and/or containment constraints, e.g.
FROM EHR e ehr_id/value=$ehrId CONTAINS COMPOSITION copenEHR-EHR-COMPOSITION.report.v1

Class expressions

Two examples of a class expression are shown below:
EHR e ehr_id/value=$ehrId
COMPOSITION copenEHR-EHR-COMPOSITION.report.v1
Class expressions are used for two purposes:

  1. indicating the constraints on RM classes so as to scope the data source for the query. For instance, EHR e[ehr_id/value='123456'] indicates that the required data must be from a specific EHR with ehr_id value '123456'; while "COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1] " indicates the required data must be from or must be associated with a composition instance with archetype id - openEHR-EHR-COMPOSITION.report.v1.
  2. defining a RM class variable that may be used by other clauses to indicate the required data or data items on which query criteria are applied. The example below uses the class expression to define a variable e which is used by the SELECT clause indicating all relevant ehr_id values are retrieved, and a variable c used by WHERE clause indicating that the query criteria is set on the composition template id value.
    SELECT e/ehr_id/value
    FROM EHR e CONTAINS COMPOSITION copenEHR-EHR-COMPOSITION.referral.v1
    WHERE c/archetype_details/template_id/value=$templateId

Class expressions syntax include three parts. A class expression must have part one and at least one of part two or part three.

  1. part one (mandatory): openEHR RM class name, such as EHR, COMPOSITION, OBSERVATION etc.
  2. part two (optional): AQL variable name
  3. part three (optional): a standard predicate or an archetype predicate.
Containment

Since archetypes are in hierarchical structure, AQL has a containment constraint which specifies the hierarchical relationships between parent and child data items. FROM clause utilises this hierarchical constraint along with class expression to determine the data source to which the AQL query is applied.

The syntax of containment constraint is very simple: using keyword CONTAINS between two class expressions. Left class expression is the the parent object of the right class expression, e.g.
EHR e CONTAINS COMPOSITION c openEHR-EHR-COMPOSITION.referral.v1
Boolean operators (AND, OR, NOT) and parentheses are used when multiple containment constrains are required, e.g.
EHR e CONTAINS COMPOSITION c openEHR-EHR-COMPOSITION.referral.v1 AND
COMPOSITION c1 openEHR-EHR-COMPOSITION.report.v1EHRR e CONTAINS COMPOSITION c openEHR-EHR-COMPOSITION.referral.v1 CONTAINS
(OBSERVATION o openEHR-EHR-OBSERVATION-laboratory-hba1c.v1 AND
OBSERVATION o1 openEHR-EHR-OBSERVATION-laboratory-glucose.v1)

WHERE

AQL WHERE clause is used to represent further criteria applied to the data items within the objects declared in the FROM clause. WHERE clause expresses the query criteria that cannot be represented in other AQL clauses, such as criteria on archetype id, composition committal date time, and the criteria on in which order the returned results should be listed.

Syntax

WHERE clause syntax has the following parts (in order): keyword WHERE and identified expression(s). Boolean operator (AND, OR, NOT) and parenthesis can be used to represent multiple identified expressions. The examples are listed below.
WHERE c/name/value=$nameValue AND c/archetype_details/template_id/value=$templateIdWHERE (c/name/value=$nameValue OR c/archetype_details/template_id/value=$templateId) AND
o/dataat0001/eventsat0006/dataat0003/itemsat0004/value/value >= 140

Identified expression

Identified expression specifies the criteria within WHERE clause. It consists of left operand, operator and right operand.

  1. Left operand is normally an identified path,
  2. Operator is normally the basic operator,
  3. Right operand is the criteria value, which normally is primitive type, such as string, integer, boolean, double, or float. When it is a string value, single quotation marks or double quotation marks are required. Right operand can also be a parameter or an identified path as well.

Examples are listed below:

3.9.4 SELECT

SELECT clause specifies what data is to be retrieved by the AQL query. The data can be any types from openEHR RM and any primitive data types. In addition, SELECT clause uses TOP to indicate the number of result setS that should be returned and name alias to rename the retrieved data.

SELECT syntax always starts with keyword SELECT, followed by TOP (optional), and identified path(s) or variable name(s) defined in the FROM clause. Each variable name or an identified path may have a name alias renaming the associated data.

If variable name(s) is(are) required, it means the variable associated full class object is retrieved, such as a COMPOSITION object or an OBSERVATION object maybe returned. If identified paths are required, the path associated data items are returned. Multiple identifiers or identified paths are separated using a comma.

Some examples are shown below.

Example 1: retrieve all compositions' name value, context start time and composer name from a specific EHR.
SELECT c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer
FROM EHR eehr_id/value=$ehrUid CONTAINS COMPOSITION c
Example 2: Retrieve all composition objects of a specific EHR.
SELECT c
FROM EHR eehr_id/value=$ehrUid CONTAINS COMPOSITION c

TOP

TOP syntax was borrowed from SQL language for representing the number of result sets that should be returned by the AQL query. It uses BACKWARD and FORWARD to indicate the direction where to start to get the number of results to be returned.

It starts with keyword TOP, followed by an integer number and/or the direction (i.e. BACKWARD, FORWARD), e.g.
SELECT TOP 10 c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer
FROM EHR eehr_id/value=$ehrUid CONTAINS COMPOSITION c

Name alias

Like SQL, AQL supports using name alias to rename the retrieved data. It starts with keyword - AS, followed by the name which conforms to the syntax rule of AQL variable. 

3.9.5 ORDER BY

The ORDER BY clause is used to sort  the returned results. It starts with the keyword ORDER BY, followed by an identified path and the keyword DESC, DESCENDING, ASCE, or ASCENDING, e.g.
ORDER BY c/name/value

3.9.6 TIMEWINDOW 

In addition to FROM clause, TIMEWINDOW helps to scope the data source from which the required data is retrieved. TIMEWINDOW is an addition query clause used in AQL to constrain the query to data that was available in the system (also know the data committal time) within the specified time criteria. This supports a timebased logical system rollback allowing a query to be executed as though it was performed at that specified time, which is essential for medico-legal reporting.

It starts with the keyword - TIMEWINDOW, and followed by a string compatible with the ISO 8601 representation of time interval.

The first example below constrains the query source to data committed to the system before 2006-01-01.
TIMEWINDOW /2006-01-01
The second example constrains the query source to data committed within the period of two years before 2006-01-01.
TIMEWINDOW P2Y/2006-01-01

BNF

AQL BNF grammar file can be found from the attachment. For the sake of reading purpose, the wiki page of AQL BNF grammar is also generated.

How to write an AQL query statement manually

Ocean Informatics has implemented an AQL query builder which can be used to generate AQL query statements based on archetypes automatically. We strongly recommend to use this tool to generate an AQL query statement because it can be very complicated. This section is for the people who either do not have the access to AQL query builder or for the people  who want to have more understanding on AQL so that they can edit an existing AQL query manually.

Query scenario 

An example is used to illustrate how to write an AQL statement. The query scenario is:

"Get all abnormal blood pressure values that are recorded in a health encounter for a specific patient."

How to write AQL query statement - step by step

Step 1. Write FROM clause

FROM clause is to scope the data source for the query. Normally the first step is to shape FROM clause, which has EHR class expression and archetype class expressions.

WHERE clause represents all query criteria that cannot be represented in the FROM clause. To write WHERE clause expression, you need to:

  1. Find the criteria. The criteria required by the scenario are abnormal blood pressure values. Different guidelines may have different definitions for abnormal blood pressure values. Here we interpret that abnormal blood pressure means 1) the systolic pressure value is greater than or equal to 140; OR 2) diastolic pressure value is greater than or equal to 90.
  2. Write an identified expression for criterion 1).
    1. Write the Identified pathfor systolic data value. We need a class variable name as the reference to the blood pressure class expression defined in the FROM clause, so a variable name - obs - is added into the FROM clause. A path to systolic data value is also required to form the identified path. Two tools can be used to get the path: openEHR ADL WORKBENCHor Ocean Template Designer. The path to the systolic data value is /data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value.  
    2. Get appropriate operator: '>=' is chosen for criterion 1.
    3. Get the criteria value - 140. The query statement including the identified expression for criterion 1) is shown below:
      FROM EHR ehr_id/value=$ehrUid CONTAINS COMPOSITION openEHR-EHR-COMPOSITION.encounter.v1
      CONTAINS OBSERVATION obs openEHR-EHR-OBSERVATION.blood_pressure.v1
      WHERE obs/dataat0001/eventsat0006/dataat0003/itemsat0004/value/value >= 140
  3. Write an identified expression for criterion 2). Following the previous instruction to write the identified expression for criterion 2) which is shown as:
    obs/dataat0001/eventsat0006/dataat0003/itemsat0005/value/value >= 90
  4. Find the appropriate boolean operator to join criterion 1) and 2) - OR. The query statement looks like:
    FROM EHR ehr_id/value=$ehrUid CONTAINS COMPOSITION openEHR-EHR-COMPOSITION.encounter.v1
    CONTAINS OBSERVATION obs openEHR-EHR-OBSERVATION.blood_pressure.v1
    WHERE obs/dataat0001/eventsat0006/dataat0003/itemsat0004/value/value >= 140 OR
    obs/dataat0001/eventsat0006/dataat0003/itemsat0005/value/value >= 90

Step 3. Write SELECT clause

Writing the SELECT clause needs to:

  1. Find what data is required. The aforementioned scenario requires abnormal blood pressure values are returned, including both systolic pressure and diastolic pressure.
  2. Write the appropriate identified path to the required item (variable name is required if the class object is required). Multiple identified path is separated using comma. For this particular scenario, two identified paths are found for systolic and diastolic data value by following Step 2. The completed query statement looks like:
    SELECT
    obs/dataat0001/eventsat0006/dataat0003/itemsat0004/value/value,
    obs/dataat0001/eventsat0006/dataat0003/itemsat0005/value/value

FROM EHR ehr_id/value=$ehrUid CONTAINS COMPOSITION openEHR-EHR-COMPOSITION.encounter.v1
CONTAINS OBSERVATION obs openEHR-EHR-OBSERVATION.blood_pressure.v1

WHERE obs/dataat0001/eventsat0006/dataat0003/itemsat0004/value/value >= 140 OR
obs/dataat0001/eventsat0006/dataat0003/itemsat0005/value/value >= 90

AQL features that are not supported in the AQL Parser implemented by Ocean Informatics

This is the list of all features that are not supported in the AQL Parser implemented by Ocean Informatics:

AQL features that are proposed, but not finalised

Arithmetic functions

 It has been proposed that AQL would support some basic arithmetic functions, such as addition (plus) , subtraction (minus) . Multiplication, and division may be supported as well. We do not have any scenario requiring for these two functions yet.

Most cases, it is required that the left operand and the right operand must be the same type in order to do the calculation, e.g. numeric types. openEHR RM supports addition and subtraction for the types of: DV_QUANTITY, DV_COUNT, DV_INTERVAL, DV_PROPORTION, DV_DATE_TIME, DV_DATE, and DV_TIME. Again, it is required that the left and right operands must be the same type. However, one exception to this is that openEHR RM also supports subtraction between all subtypes of DV_TEMPORAL and type of DV_DURATION. AQL would support these arithmetic functions that openEHR RM supports.

The example below shows a scenario which requires that a composition report (c1) is issued in the last year. It illustrates that a type of ISO 8601 date string (current-date()) subtracts a type of ISO 8601 interval string (P1Y).
c1/context/other_context/itemsat0006/itemsat0013/value > current-date() - PIY

Other functions

It is proposed that AQL may also support other functions, such as:

Advanced operators

All advanced operators have been proposed, but not supported yet.

References

 [1]. ISO/TC 215 Technical Report: Electronic Health Record Definition, Scope, and Context. Second Draft, August 2003. (Accessed on 14/01/2008)