Population querying - Subject Proxies in AQL

For some time there have been discussions about how best to support anonymous population querying in openEHR EHR AQL, given that demographic entities are intentionally separate. By anonymous, we mean queries that are not directly identifiable e.g on Year of Birth, sex ,gender, broad location alive/dead etc.

e.g. How do we do “Give me a list of EHRIds for Females over 65 who have elevated blood pressures”?

Of course, most CDR implementation environments will have a way of ‘making this work’ but there is no ‘standard’ way of defining this in current AQL

The barriers:

1. Even if the Demographics RM was supported in AQL, it is not widely used, and in many cases, other demographics services are used. So we need something that is agnostic of the exact demographic service used.

2. There is also need to support minimal demographics support within the EHR itself. Although ‘discouraged’ we know there are examples of use of the ehr_status/other_details structure , or a separate composition holding simple subject demographic details, usually anonymised

The borad solution proposed is to define an AQL function(s) or to represent common population query variables such as Age/Year of birth, sex , gender etc. It would be up to the CDR implementation (probably needing quite local adapttion) to resolve the queried varaible to an actual datapoint

So.. in the context of an AQL like

SELECT c FROM EHR e CONTAINS COMPOSITION c WHERE YearOfBirth() > '1958'

YearofBirth() might resolve to ….

  • An openEHR Demographics data path

  • An openEHR EHR service path

  • a FHIR path

  • another api endpoint

depending on the local demographics service provision

Challenges

1. Exactly how do we describe/define the population query variables

  1. Do we try to align with existing demographics models e.g openEHR or FHIR

We are definitely not trying to suggest or enforce any particular actual demographics representation

 

Options

  1. Make use of an existing EHR/EHR_STATUS path and a pseduo archetype to act as the object definition

    SELECT e FROM EHR e WHERE EHR e/ehr_status/subject/@entities/[openEHR-EHR-CLUSTER.pop_demographics.v1]/items[at0001|Birth Sex|] = 'at0005|Male|'

In this case, although the CLUSTER.pop_demographics.v1 exists, it is only definitional and is not actually carried in the CDR

  1. Use an indepedent object such as SUBJECT(). (@Sebastian Iancu )
    2a

    SELECT SUBJECT('dateofBirth') as dob, FROM EHR e CONTAINS COMPOSITION c WHERE SUBJECT('dateofBirth') < '1958' and SUBJECT('gender') matches {'at0005|Male|'}

or a variation,
2b

or with clean link between EHR and SUBJECT:
2c

  1. Making use of some of the ideas around a broader subject proxy service

See more at https://discourse.openehr.org/t/demographic-virtual-values-workgroup/5359/18

Other thoughts

Some of the ideas in SQL on FHIR seem pertinent
https://www.health-samurai.io/articles/sql-on-fhir-an-inside-look

and AGE() might be better implemented as a simple AQL function, along with NOW() as a convenience

2. Kickoff Virtual Demographics 16.07.24

Participants

Ian McNicoll

Bostjan Lah

Matija Polajnar

Diego Bosca

Severin Kohler

 

All of the below solutions are still WIP

We discussed different AQL solutions. It was agreed that the following version is probably the best, but it still needs further discussion. The syntax introduces the @ symbol to indicate the use of a virtual demographics archetype. Dots were used, as they are easier to write.

The model behind this query is provided as an archetype for documentation and modeling purposes, using the FHIR Patient as an orientation. The scope of the virtual demographics is for population queries; therefore, there is no need for a non-pseudonymized/anonymized person representation. Codings used should also derive from FHIR values used in Patient.
The model can be found at https://github.com/freshehrteam/openehr-sec.git

Besides the archetype description, the idea of introducing an Implementation Guide (IG) as a usage guideline was welcomed.

3. Virtual Demographics 28.08.24

Participants

Ian McNicoll

Bostjan Lah

Matija Polajnar

Severin Kohler

 

 

We discussed Diego Boscas' and Chunla Ma's comments. Introducing age comes with a lot of challenges. We currently cannot anticipate all the problems and requirements, such as what to use as an output format (e.g., Julian date, ISO date). With ISO, there is the issue of converting days to months, for example. Therefore, we agreed not to support age for now and instead use a date difference. Chunla Ma's proposal would require supporting a generic PROXY_PARTY logic, which has implications beyond demographics, like querying a PROXY_PARTY inside a composition. Consequently, we did not adopt this solution.

We discussed how much of the new virtual demographics query logic needs to be consistent with the current AQL and to what extent we can introduce new grammar and syntax. We agreed that, for now, we want to maintain two versions of the grammar: Version One, which aligns with current AQL, and Version Two, which is a novel approach to a modern, intuitive AQL syntax for virtual demographics. Version Two is separated into two proposals. For Version Two, we want to make use of FHIR coding, as it provides a simpler approach to querying terminologies than having term mappings and defining codes. However, this is still open for discussion.

Archetype:

From: https://github.com/freshehrteam/openehr-sec

image-20240829-094944.png

Version 1


Version 2a


Version 2b

 

 

4. Virtual Demographics 05.09.24

Ian McNicoll

Bostjan Lah

Matija Polajnar

Severin Kohler

Diego Bosca

Sebastian Iancu

We discussed whether all functions should be allowed on the virtual demographics. We agreed that, for simplicity, we will only allow LIMIT, OFFSET, and ORDER BY. Functions like COUNT, MEAN, MAX, MIN, and AVERAGE will not be supported. We also agreed that the virtual demographics parameters should be based on FHIRsearch parameters. We agreed that we want to maintain only this version instead of 2, since a typical AQL syntax (/defining_code/….) could confuse users that these are RM elements.

Sebastian made a comment that we need to emphasize that the virtual demographics extension to AQL is agnostic of the format in which demographic data is maintained by vendors. It is oriented towards FHIRsearch parameters but does not require the underlying data to be in FHIR.

TODOs:

  • Diego agreed to define some sample queries.

  • Severin agreed to extend ANTLR AQL grammar to support the first draft of virtual demographics.

Open for discussion:

  • Define the e@subject return.

  • Should we create a REST API endpoint to connect to demographic data for the AQL engine?