Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Usually such layers are built from at least two internal layers of software: the first being the abstract interface, the second being a set of bindings, one for each target database. In practice, there may be three layers since there may be an internal division between the logic for object and relational (and other) storage mechanisms.

Is persistence different in openEHR?

The openEHR architecture is different from other architectures in the health domain, and in most domains. The main difference is that it doesn't only have an object model from which to create software, database schemas etc, it also has also a layer of domain models called archetype FAQs. As a consequence, the part of the architecture that is defined as object models (known as the "reference model" or RM) is smaller and more generic than many models. The RM can be considered for most purposes as a typical object model. To get a feel for the architecture, the Architecture Overview is a good place to start.

On one level therefore, persisting openEHR data is no different from persisting data in any object model of an equivalent number of classes (around 100, including all data types, EHR types, demographic types). There are two main challenges:

  • bridging the object/relational gap: how to persist object data in relational databases which may be a requirement of the deployment environment
  • the query trade-off: in any database, storing and retrieving numerous fine-grained objects is costly in terms of disc access time, yet the finer the granularity of storage, the more likely it is that the database's inbuilt query engine can query the data directly rather than forcing the application to retrieve and process the query itself..

These problems are related in their solution since if we opt for very coarse-grained data storage, the object/relational gap diminishes as well.

How can openEHR data be characterised?

Structure

To understand how to persist openEHR objects, we need to have a feel for the data. The business objects are known as "top-level structures" in openEHR, and are  the items that are version controlled, including:

Each of these types is equivalent to a document in the sense that it defines the level of granularity of store and retrieve from a service. None of the top-level structure types contain any "live" references to objects outside its own hierarchy. This means that a "store" operation on such an object will cause only that object hierarchy to be stored.Changing the interior parts of such object is generally done by retrieving the whole thing and modifying it in memory, then committing the changed version. The real question of interest with respect to how data are pesisted is to do with querying rather than the store/retrieve/modify/store cycle.

The following figure illustrates typical openEHR data and the scope of archetyping and templating.

Paths

Where openEHR data differs from most other object data is that it is archetyped, meaning that contains archetype node identifiers in every data node (archetype_node_id attribute inherited from LOCATABLE class). Further, every node (descendant of LOCATABLE) has a unique name attribute (also inherited from LOCATABLE). These attributes guarantee that an Xpath-style path can be defined for every single node in openEHR data, to the leaf ELEMENTs (the ELEMENT holds a DATA_VALUE instance). See the Architecture Overview for a detailed explanation.

All openEHR data contains two node identifiers - archeytpe_node_id and name, both inherited from the LOCATABLE class. These enable the creation of Xpath-style paths that can be used to uniquely identify every node in a data composition (see the openEHR Architecture Overview for details on paths). However, these paths differ from Xpath paths in that they carry the node meanings from archetypes, not just the reference model attribute names as typical Xpaths do. For example, consider the following path:

...

Should I build my own openEHR persistence

In most cases it is a bad idea to build your own openEHR persistence solution, since it (like any implementation and optimization) takes time away from making clinically more useful things based on some existing commercial or Open Source openEHR persistence solution.

If however your main (research) interest is advances storage solutioins, make sure to read what has already been published about openEHR persistence before you build. In addition to publication search engines (like Google Scholar) the openEHR Zotero Persistence category can be useful 

Is persistence different in openEHR?

The openEHR architecture is different from other architectures in the health domain, and in most domains. The main difference is that it doesn't only have an object model from which to create software, database schemas etc, it also has also a layer of domain models called archetype FAQs. As a consequence, the part of the architecture that is defined as object models (known as the "reference model" or RM) is smaller and more generic than many models. The RM can be considered for most purposes as a typical object model. To get a feel for the architecture, the Architecture Overview is a good place to start.

On one level therefore, persisting openEHR data is no different from persisting data in any object model of an equivalent number of classes (around 100, including all data types, EHR types, demographic types). There are two main challenges:

  • bridging the object/relational gap: how to persist object data in relational databases which may be a requirement of the deployment environment
  • the query trade-off: in any database, storing and retrieving numerous fine-grained objects is costly in terms of disc access time, yet the finer the granularity of storage, the more likely it is that the database's inbuilt query engine can query the data directly rather than forcing the application to retrieve and process the query itself..

These problems are related in their solution since if we opt for very coarse-grained data storage, the object/relational gap diminishes as well.

How can openEHR data be characterised?

Structure

To understand how to persist openEHR objects, we need to have a feel for the data. The business objects are known as "top-level structures" in openEHR, and are  the items that are version controlled, including:

Each of these types is equivalent to a document in the sense that it defines the level of granularity of store and retrieve from a service. None of the top-level structure types contain any "live" references to objects outside its own hierarchy. This means that a "store" operation on such an object will cause only that object hierarchy to be stored.Changing the interior parts of such object is generally done by retrieving the whole thing and modifying it in memory, then committing the changed version. The real question of interest with respect to how data are pesisted is to do with querying rather than the store/retrieve/modify/store cycle.

The following figure illustrates typical openEHR data and the scope of archetyping and templating.

Paths

Where openEHR data differs from most other object data is that it is archetyped, meaning that contains archetype node identifiers in every data node (archetype_node_id attribute inherited from LOCATABLE class). Further, every node (descendant of LOCATABLE) has a unique name attribute (also inherited from LOCATABLE). These attributes guarantee that an Xpath-style path can be defined for every single node in openEHR data, to the leaf ELEMENTs (the ELEMENT holds a DATA_VALUE instance). See the Architecture Overview for a detailed explanation.

All openEHR data contains two node identifiers - archeytpe_node_id and name, both inherited from the LOCATABLE class. These enable the creation of Xpath-style paths that can be used to uniquely identify every node in a data composition (see the openEHR Architecture Overview for details on paths). However, these paths differ from Xpath paths in that they carry the node meanings from archetypes, not just the reference model attribute names as typical Xpaths do. For example, consider the following path:

  • [openEHR-EHR-COMPOSITION.birth_note.v1]/content[at0001]/items*[openEHR-EHR-OBSERVATION.Apgar.v1]* /data/events[at0003]/data/items[at0025]/value/magnitude

...

The default situation will be that using an O/R product on a typical object model over a relational database will result in numerous tables and extremely fine-grained object storage and retrieval, with the consequent performance penalty. Most likely, an O/R product will not know about business object boundaries and will do the same thing as an object database with a naively designed object model: store and retrieve everything reachable by reference-following. Avoiding these problems means at a minimum reducing the granularity of the objects being stored; see below.Examples of object/relational products include: Apache ObJectRelationalBridge (OJB) for Java, Grails ORM and DataObjects.NETreference-following. Avoiding these problems means at a minimum reducing the granularity of the objects being stored; see below.

Examples of object/relational products include: Apache ObJectRelationalBridge (OJB) for Java, Grails ORM and DataObjects.NET.

If instead of using default O/R framework mappings, you make sure the storage method is optimized to openEHR structures and query patterns then it is possible to get reasonable performance also in persistence solutions based on tabular formats and relational algebra, e.g. Hadoop or relational databases. The paper

Relational databases

Object data can be directly stored in a relational database, but the schema design is a greater issue. If the intention is that schema is a derivative of the object model - i.e. the "classical" approach to mapping (typical strategies) then the schema design may not be trivial. This kind of schema design is what many of the O/R tools try to automate and/or hide. However, other strategies are available, including one very interesting one which is possible due to the paths in openEHR data.

...