Ocean Informatics Team: Chunlan Ma MD, Heath Frankel, Thomas Beale
Currently, the available query languages, such as SQL, XQuery, or Object-Oriented Query Language, have dependencies on particular system data structure and working environment. Users must know the persistence data structure of an EHR in order to write an appropriate query. The query statement cannot be used by other systems which have different data store. Consequently, none of the these languages meet the aforementioned requirements and none of these can be used directly as the query language required by integrated care EHRs
Archetype Query Language (AQL) is a declarative query language developed specifically for expressing queries used for searching and retrieving the clinical data found in archetype-based EHRs. It is applied to the openEHR EHR Reference Model (RM) and the openEHR clinical archetypes, but the syntax is independent of applications, programming languages, system environment, and storage models. The minimum requirement for data to be querying with AQL (including with archetype structures and terminology) is for the data to be marked at a fine granularity with the appropriate archetype codes and terminology codes. This may be native openEHR-structured data, or legacy system data to which the relevant data markers (mainly archetype paths and terminology codes) have been added. Unlike other query languages, such as SQL or XQuery, AQL expresses the queries at the archetype level, i.e. semantic level, other than at the data instance level. This is the key in achieving sharing queries across system boundaries or enterprise boundaries.
AQL has the following distinctive features:
AQL has some other features which can be found from other query languages:
Like SQL, AQL has five clauses: SELECT, FROM, WHERE, ORDER BY, and TIMEWINDOW. The SELECT clause specifies the data elements to be returned. The FROM clause specifies the result source and the corresponding containment criteria. The WHERE clause specifies data value criteria within the result source. The ORDER BY clause indicates the data items used to order the returned result set. TIMEWINDOW clause is to constrain the query to data that was available in the system within the specified time criteria.
Here is an example of AQL statement. This statement returns all blood pressure values where systolic value is greater or equal to 140 or diastolic value is greater or equals to 90 within a specified EHR. AQL syntax is a synthesis of SQL structural syntax and openEHR path syntax.
AQL has SELECT, FROM, WHERE and ORDER BY clauses. openEHR path syntax is used to locate any node or data values within archetypes. The SELECT clause uses openEHR path syntax to indicate expected archetypes, elements, or data values that are need to be returned. FROM clause uses containment mechanisms indicating the scope of the query data source. WHERE clause utilises archetype path to set query criteria on archetypes or any node within the archetypes. The following sections introduce the AQL syntax in detail.
AQL has the following reserved keywords and characters:
The openEHR path syntax has been described in a detail (http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/overview.pdf). AQL utilises the basic openEHR path syntax to represent both coarse-grained and fine-grained archetype nodes and openEHR class instances and attributes. The syntax details are not repeated here. A set of openEHR path examples are provided.
Archetype path is the path referring to any nodes within an archetype. The picture below is an ADL workbench screen shot of apgar observation node map. The table below shows some examples.
Archetype RM type |
Archetype node name |
Path syntax |
---|---|---|
OBSERVATION |
Apgar score |
/ |
OBSERVATION |
1 minute |
/data[at0002]/events[at0003] |
OBSERVATION |
Heart rate |
/data[at0002]/events[at0003]/data[at0001]/items[at0005] |
Another type of openEHR path is the path pointing to an attribute of openEHR Reference Model classes. The examples shown below uses the attributes from COMPOSITION class.
OpenEHR RM attribute name |
Path syntax |
---|---|
COMPOSITION.category |
/category |
COMPOSITION.context.start_time |
/context/start_time |
COMPOSITION.uid.value |
/uid/value |
AQL variables are used to refer a specific archetype or openEHR class defined within an AQL statement. It has following featueres:
AQL syntax supports parameterization. AQL parameters are used to indicate a criteria, which will be substituted with real criteria value at run time. Supporting parameterization is one of the mechanism used to realise sharing AQL statements within a system or cross system boundaries.
Parameters are needed when the same AQL query statement is used with different criteria values. This AQL example is to return all abnormal blood pressure values for a single specific EHR. This query has a parameter $ehrUid. This parameter will be substituted by a real EHR.ehr_id value at run time. Consequently, this query can be reused for all EHRs either within an EHR system or by other EHR systems (Assuming these EHR systems support AQL).
(This topic might be out of the scope of this article)
AQL query parameters can be resolved at application level, or EHR system level. It depends on what the query parameters are used for and the design/implementation of the system or components. Query parameters would be normally resolved outside of a query engine.
If a query needs to be reusable across different EHR systems, the query parameters (if there are any) normally need to be registered in these EHR systems so that these parameters can be resolved with real values there. A typical example of this type of query parameter is $ehrUid. While if a query is only reusable within an application, then the query parameters would be resolved within the application, such as the parameter of healthcare facility identifier or template identifier.
There is no specific guidelines on how to resolve query parameters. Generally speaking, a parameter name is used as a key (or a key is associated with a parameter within a query context) and the key needs to be unique within the boundary where the parameters are resolved. The EHR system or application needs to have the API functions to get the real value with a given parameter name or key.
A parameter always starts with a dollar sign $ followed by the parameter name, e.g. $ehrUid. Expect for AQL reserved words or characters, parameter name can be any letter(s), alphanumeric values or underscore _. It cannot have a space.
A parameter can be used for all criteria values within an AQL statement, e.g.
[$archetypeId], [at0003, $nameValue], [ehr_id/value=$ehrId] |
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value>$systolicCriteria |
AQL has three types of predicates: standard predicate, archetype predicate, and node predicate.
Standard predicate always has left operand, operator and right operand, e.g. [ehr_id/value='123456'].
>, >=, =, <, <=, != |
Archetype predicate is a shortcut of standard predicate, i.e. the predicate does not have left operand and operator. It only has an archetype id, e.g. [openEHR-EHR-COMPOSITION.encounter.v1]. Archetype predicate is a specific type of query criteria indicating what archetype instances are relevant to this query. It is used to scope the the data source from which the query expected data is to be retrieved. Therefore, an archetype predicate is only used within an AQL FROM clause, for example,
FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS |
Node predicate is also a shortcut of standard predicate. It has the following forms:
[at0002] |
[at0002 and name/value=$nameValue] [at0002 and name/value='real name value'] |
[at0002, $nameValue] [at0002, 'real name value'] |
[at0002 and value/defining_code/terminology_id/value=$terminologyId] |
Node predicate defines criteria on fine-grained data. It is only used within an identified path.
The table below shows the basic AQL operators, meaning and example.
Operator |
Meaning |
Example |
|
---|---|---|---|
= |
Equal |
|
|
> |
Greater than |
|
|
>= |
Greater than or equal to |
|
|
< |
Smaller than |
|
|
<= |
Smaller than or equal to |
|
|
!= |
not equals to |
|
matches operator is used in WHERE clause. This operator needs left operand and right operand. Left operand is an AQL identified path. Right operand is enclosed within curly braces. Right operand of matches operator has three forms:
SELECT o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/name FROM Ehr [uid=$ehrUid] CONTAINS Composition c CONTAINS Observation o[openEHR-EHR-OBSERVATION.microbiology.v1] WHERE o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/items[at0019]/items[at0021]/name/defining_code/code_string matches {'18919-1', '18961-3', '19000-9'} |
SELECT e/ehr_status/subject/external_ref/id/value, diagnosis/data/items[at0002.1]/value FROM EHR e CONTAINS Composition c[openEHR-EHR-COMPOSITION.problem_list.v1] CONTAINS Evaluation diagnosis[openEHR-EHR-EVALUATION.problem-diagnosis.v1] WHERE c/name/value='Current Problems' AND diagnosis/data/items[at0002.1]/value/defining_code matches { terminology://Snomed-CT/hierarchy?rootConceptId=50043002 } |
matches {SELECT ... FROM .. WHERE} |
EXISTS is a unary operator, which is always followed by a single operand. The single operand is an AQL identified path. This operator return boolean results: true means the data associated with the specified path exists, false otherwise. EXISTS expression is a single identified expression. The latest grammar only allows this operator in WHERE clause.It may be required as part of openEHR path in future.
The example below means retrieving the identifiers of the EHRs which do not have discharge admin entry instance.
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] |
This example below means from a particular EHR, retrieving all composition uid values and the contained instruction instances if the instruction instances have links with particular EHR_URI value.
SELECT c/uid/value, instruction FROM EHR e [ehr_id/value=$ehrid] CONTAINS COMPOSITION c CONTAINS INSTRUCTION instruction[openEHR-EHR-INSTRUCTION.referral.v1] WHERE EXISTS instruction/links[target='ehr://327000002/87284370-2D4B-4e3d-A3F3-F303D2F4F34B@latest_trunk_version'] |
NOT is a unary boolean operator and it is always followed by either a single identified expression or boolean identified expression. It returns boolean results: true means the followed expression (single or boolean expression) is false.
The example below uses NOT operator followed by a single identified expression
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] |
The example below uses NOT operator followed by a boolean identified expression.
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT (EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] AND e/ehr_status/subject/external_ref/namespace='CEC') |
The above example is equivallent to the two expressions shown below:
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR NOT e/ehr_status/subject/external_ref/namespace='CEC' SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR e/ehr_status/subject/external_ref/namespace != 'CEC' |
These advanced operators are not yet supported by the grammar. The operator syntax is borrowed from ADL specifications. These are proposed to improve the richness and flexibility of AQL so that AQL syntax supports more complicated query scenarios.
Identified paths are used to locate data items within an archetype RM class. Except for FROM clause, an identified path could appears in any clauses of an AQL statement. For example, it can be used to indicate the data to be returned in a SELECT clause, or the data item on which query criteria are applied in a WHERE clause.
AQL identified path has the following forms:
o/data[at0001]/.../data[at0003]/items[at0004]/value/value |
o[name/value=$nameValue] |
o[name/value=$nameValue]/data[at0001]/.../data[at0003]/items[at0004]/value/value |
All strings are enclosed in double quotes or single quotes. It doesn't support line breaks.
TODO list:
Integers are represented as numbers, such as 1, 2, 365. Commas or periods for breaking long numbers are not allowed. Hex integer is not supported.
Real numbers are the numbers with a decimal, such as 3.1415926. Commas or periods for breaking long numbers are not allowed. Hexadecimal is not supported.
Boolean values can be indicated as true or false. It is case-insensitive.
No matter the date/date time is complete or partial, it is allowed in AQL as long as it is in ISO8601 date/date time format. Quotes (either single quotes or double quotes) are required to enclose the date/date time value. In AQL grammar, it is treated as a date value if the value is a complete calendar date represented in ISO8601 extended format and if single quotes are used. If the value is in other ISO8601 date/date time format, AQL treats it as a string value, but the AQL query engine processes it differently from a normal string value. It is a temporary solution. In future, a date/date time value should be identified as date/date time value if it is a valid ISO8601 date/date time. Examples are as below:
WHERE composition/context/start_time > '19860101' // a complete date. AQL grammar identifies it as a string value. WHERE composition/context/start_time < "120000+0930" // ISO8601 time. AQL grammar identifies it as a string value. WHERE composition/context/start_time <= '1986-01-01' // complete calendar date in ISO8601 extended format. AQL grammar identifies this value as a date value. |
URI leaf data is enclosed within curly braces after "matches" operator. URI can be expressed as a normal URI described in http://www.ietf.org/rfc/rfc3986.txt. URI is not case sensitive. In AQL, a URI can be either terminology URI or EHR URI.
A terminology URI consists of the following components:
This is an example of a terminology URI:
to be continued...
Value list leaf data is enclosed within curly braces after matches operator. Each item is separated by comma if there are multiple items in the list. Value list can be string value list, date time value list, integer value list, and real value list. Quotes are required for string value list and date time value list.
A value list is only used in AQL WHERE clause when the criteria is to match one item of the list. The relationships among these value list items are "OR".
Some examples of value list:
matches {'string item 1', 'string item 2', 'string item3'} // a string list. It is equivalent to matches 'string item1' or .. matches 'string item 2' or .. matches 'string item 3' matches {'20060101', '20070101', '20080101'} // a date value list matches {1, 2, 3} // a integer list matches {1.1, 2.5, 3.8} // a real value list |
It is not supported in the current AQL grammar.
Value range leaf data is enclosed within curly braces after matches operator. It is only used in AQL WHERE clause to specify a criteria within a range. The syntax is similar as intervals of ordered primitive types detailed in Section 4.5.2, adl.pdf. Some examples:
matches {|3.0..5.0|} // means 3.0 <= x <=5.0 matches {|<8|} // means x<8 |
AQL structure has been briefly introduced in Section 2.2. This section describes the syntax in more formal detail.
AQL structure has the following clauses and these clauses must be listed in the same order as the list below in an AQL statement. An AQL statement must contain SELECT and FROM clauses.
The FROM clause utilises class expressions and a set of containment criteria to specify the data source from which the query required data is to be retrieved. Its function is similar as the FROM clause of an SQL expression.
A simple FROM clause consists of three parts: keyword - FROM, class expression and/or containment constraints, e.g.
FROM EHR e [ehr_id/value=$ehrId] CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1] |
Two examples of a class expression are shown below:
EHR e [ehr_id/value=$ehrId] \\EHR class, class identifier/variable, and a standard predicate COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1] \\COMPOSITION class, class identifier/variable, and an archetype predicate |
Class expressions are used for two purposes:
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.referral.v1] WHERE c/archetype_details/template_id/value=$templateId |
Class expressions syntax include three parts. A class expression must have part one and at least one of part two or part three.
Since archetypes are in hierarchical structure, AQL has a containment constraint which specifies the hierarchical relationships between parent and child data items. FROM clause utilises this hierarchical constraint along with class expression to determine the data source to which the AQL query is applied.
The syntax of containment constraint is very simple: using keyword CONTAINS between two class expressions. Left class expression is the the parent object of the right class expression, e.g.
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] |
Boolean operators (AND, OR, NOT) and parentheses are used when multiple containment constrains are required, e.g.
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] AND COMPOSITION c1 [openEHR-EHR-COMPOSITION.report.v1] |
EHRR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] CONTAINS (OBSERVATION o openEHR-EHR-OBSERVATION-laboratory-hba1c.v1 AND OBSERVATION o1 openEHR-EHR-OBSERVATION-laboratory-glucose.v1) |
An AQL WHERE clause is used to represent further criteria applied to the data items within the objects declared in the FROM clause. A WHERE clause expresses the query criteria that cannot be represented in other AQL clauses, such as criteria on archetype id, composition committal date time, and the criteria on in which order the returned results should be listed.
The WHERE clause syntax has the following parts (in order): keyword WHERE and identified expression(s). Boolean operator (AND, OR, NOT) and parenthesis can be used to represent multiple identified expressions. Examples:
WHERE c/name/value=$nameValue AND c/archetype_details/template_id/value=$templateId |
WHERE (c/name/value=$nameValue OR c/archetype_details/template_id/value=$templateId) AND o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 |
Identified expression specifies the criteria within WHERE clause. It consists of left operand, operator and right operand.
Examples:
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 c/archetype_details/template_id/value='health_encounter' |
c/archetype_details/template_id/value=$templateParameter |
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value > o1/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value |
A SELECT clause specifies what data is to be retrieved by the AQL query. The data can be any types from openEHR RM and any primitive data types. In addition, SELECT clause uses TOP to indicate the number of result setS that should be returned and name alias to rename the retrieved data.
The SELECT syntax always starts with keyword SELECT, followed by TOP (optional), and identified path(s) or variable name(s) defined in the FROM clause. Each variable name or an identified path may have a name alias renaming the associated data.
If variable name(s) is(are) required, the full object of the type associated with the variable is retrieved, such as a COMPOSITION, an OBSERVATION object etc. If identified paths are required, the path-associated data items are returned. Multiple identifiers or identified paths are separated using a comma.
Some examples are shown below.
Example 1: retrieve all compositions' name value, context start time and composer name from a specific EHR.
SELECT c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c |
Example 2: Retrieve all composition objects of a specific EHR.
SELECT c FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c |
TOP syntax was borrowed from SQL language for representing the number of result sets that should be returned by the AQL query. It uses BACKWARD and FORWARD to indicate the direction where to start to get the number of results to be returned.
It starts with keyword TOP, followed by an integer number and/or the direction (i.e. BACKWARD, FORWARD), e.g.
SELECT TOP 10 c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c |
Like SQL, AQL supports using name alias to rename the retrieved data. It starts with keyword - AS, followed by the name which conforms to the syntax rule of AQL variable.
The ORDER BY clause is used to sort the returned results. It starts with the keyword ORDER BY, followed by an identified path and the keyword DESC, DESCENDING, ASCE, or ASCENDING, e.g.
ORDER BY c/name/value |
In addition to FROM clause, TIMEWINDOW helps to scope the data source from which the required data is retrieved. TIMEWINDOW is an addition query clause used in AQL to constrain the query to data that was available in the system (also know the data committal time) within the specified time criteria. This supports a timebased logical system rollback allowing a query to be executed as though it was performed at that specified time, which is essential for medico-legal reporting.
It starts with the keyword - TIMEWINDOW, and followed by a string compatible with the ISO 8601 representation of time interval.
The first example below constrains the query source to data committed to the system before 2006-01-01.
TIMEWINDOW /2006-01-01 |
The second example constrains the query source to data committed within the period of two years before 2006-01-01.
TIMEWINDOW P2Y/2006-01-01 |
The result structure of an AQL query is described here as part of the EHR service definition.
Ocean Informatics has implemented an AQL query builder which can be used to generate AQL query statements based on archetypes automatically. We strongly recommend to use this tool to generate an AQL query statement because it can be very complicated. This section is for the people who either do not have the access to AQL query builder or for users who want to have more understanding on AQL so that they can edit an existing AQL query manually.
An example is used to illustrate how to write an AQL statement. The query scenario is:
"Get all abnormal blood pressure values that are recorded in a health encounter for a specific patient."
A FROM clause is to scope the data source for the query. Normally the first step is to shape FROM clause, which has EHR class expression and archetype class expressions.
FROM EHR [ehr_id/value=$ehrUid] |
COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1] |
FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1] |
WHERE clause represents all query criteria that cannot be represented in the FROM clause. To write WHERE clause expression, you need to:
FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 |
obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90 |
FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90 |
Writing the SELECT clause needs to:
SELECT obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude, obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude>= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude>= 90 |
This is the list of all features that are not yet supported in the AQL Parser implemented by Ocean Informatics (Jun 2008):
It has been proposed that AQL would support some basic arithmetic functions, such as addition , subtraction . Multiplication, and division may be supported as well. We do not have any scenario requiring for these two functions yet.
Most cases, it is required that the left operand and the right operand must be the same type in order to do the calculation, e.g. numeric types. openEHR RM supports addition and subtraction for the types of: DV_QUANTITY, DV_COUNT, DV_INTERVAL, DV_PROPORTION, DV_DATE_TIME, DV_DATE, and DV_TIME. Again, it is required that the left and right operands must be the same type. However, one exception to this is that openEHR RM also supports subtraction between all subtypes of DV_TEMPORAL and type of DV_DURATION. AQL would support these arithmetic functions that openEHR RM supports.
The example below shows a scenario which requires that a composition report (c1) is issued in the last year. It illustrates that a type of ISO 8601 date string (current-date()) subtracts a type of ISO 8601 interval string (P1Y).
c1/context/other_context/items[at0006]/items[at0013]/value > current-date() - PIY |
It is proposed that AQL may also support other functions, such as: