Ocean Terminology Query Language (TQL)

Introduction

This page provides a description and grammar of the current Ocean Informatics developed Terminology Query language for ref-set specification. This language is known here as 'TQL' or Terminology Query Language, although it does not purport to be a full general purpose terminology query language. TQL is designed for use with structured terminologies, such as used in health and finance. In health, terminologies such as Snomed-CT, ICDx and LOINC are targeted. A TRL query is applied with respect to a particular terminology, and generates a structure result in the same form as the original terminology, i.e. a "structured subset". Subsets can be used for choosing terms within a GUI application environment, and can be specified in conjunction with content model definitions, such as openEHR archetypes and templates.

A primary use for subsets is open definition, e.g. within government e-health programmes, provider organizations, of re-usable subsets. This clearly requires an openly specified language and result representation.

Nomenclature

In the following, a Ref-set definition is usually referred to as a 'query', since it is a kind of query against a target terminology which returns matching components. The result is referred to as 'result', 'resulting ref-set' or 'ref-set' or 'subset'. From the point of view of this specification, a 'subset' and a 'ref-set' are equivalent things.

Development Status

TQL has been in production use within Ocean's products since 2006, and has been stable since end 2007. It has been shown to work in a high-performance terminology service environment and has been heavily tested against Snomed-CT, ICD10 and LOINC.

Areas of expected development:

  • An international agreement on an identification scheme for both ref-set queries and ref-set results, that incorporates the notion of subset version as well as release of the terminology;
  • A final standard proposal might use a different query syntax and refset serialized form as TQL (i.e. we are mainly interested in common semantics).

Intellectual Property Statement

Ocean Informatics proposes to donate the description and design of the TQL language to either IHTSDO or OMG (managing the CTS2 standardisation effort) on the understanding that it be made an open specification for use in any terminology environment, under an appropriate copyright and license, if required.

Requirements

The design of TQL was based on the following requirements.

  1. The language should maximise the use of the relationships in the terminology to express the subset, thereby minimising the need for maintenance as new terms added to a complex terminology will likely to be included in the subset. This should also simplify the expression statement. 
  2. Terms must be able to be excluded in a number of ways: from the resulting subset globally from a particular branch of a hierarchy. Child nodes of the excluded term must be able to be included locally if required. 
  3. Terms must be able to be included in the browseable view of the subset but not chosen as terms by the user. These terms may be added by the user.
  4. The resulting subset must be browseable (the user can traverse any hierarchies) and be available and searchable over the web with a very rapid response time. 
  5. One subset definition must be able to be able to be based on and entirely within another subset definition. 
  6. A subset definition must be able to traverse more than one hierarchy as part of the same statement, preserving the relationship in the resulting subset. 
  7. A subset definition must be able to be presented without relationships as a flat list. 
  8. Logical operations must be able to be performed on one or more subsets including union (OR), subtraction (NOT), intersection (AND) and exclusion (Xor). 
  9. Result subsets arising from a subset definition when applied to a different release of a terminology must be able to be differentiated and compared, and specific evaluation of a subset definition against a particular terminology release must be possible. 
  10. The language should be expressible as a UML class model and independent of any particular implementation technology  
  11. Statements in the language should be efficiently serialisable and persistable as XML, possibly in other serialized forms as well
  12. The language should be implementable in such a way as to enable fast result generation, so that human authors of subsets can reasonably work in real-time in an iterative fashion. Practically this means that queries should return results in the 1-second time domain or better, for subsets of typical size.

All these requirements must be met for the key terminologies in use in the domains of health and finance.

Service Architecture

The following figure illustrates the service architecture assumed in this specification. In it three services are specified:

  • TQS (Terminology Query Service): a service for storing TQL queries
  • TS (Terminology Service): a service providing access to terminology/ies
  • TSS (Terminology Subset Service): a service for persisting and providing access to subsets generated as a result of running queries against the relevant terminologies.

Here we assume further that versioning may optionally be supported in some deployments of all three services. Any practical environment may have a different arrangement of services, but the functions of each service described here would need to be provided somewhere. This specification does not define the service interfaces, only the language in which TQL queries are expressed.

TQL Description

Acknowledgements

Thanks to Dr Sam Heard and Hugh Grady of Ocean Informatics for making this information available.