Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

What is a persistence layer?

...

Usually such layers are built from at least two internal layers of software: the first being the abstract interface, the second being a set of bindings, one for each target database. In practice, there may be three layers since there may be an internal division between the logic for object and relational (and other) storage mechanisms.

Should I build my own openEHR persistence solution?

In most cases it is a bad idea to build your own openEHR persistence solution, since it (like any similar implementation and optimization) takes time away from making clinically more useful things based on some already existing commercial or Open Source openEHR persistence solution. See some existing known alternatives at https://openehr.org/products_tools/platform/.

If however your main (research) interest is advanced storage solutioins, then make sure to read what has already been published about openEHR persistence before you build. In addition to publication search engines (like Google Scholar) the openEHR Zotero Persistence category can be useful.

Is persistence different in openEHR?

The openEHR architecture is different from other architectures in the health domain, and in most domains. The main difference is that it doesn't only have an object model from which to create software, database schemas etc, it also has also a layer of domain models called archetype FAQs. As a consequence, the part of the architecture that is defined as object models (known as the "reference model" or RM) is smaller and more generic than many models. The RM can be considered for most purposes as a typical object model. To get a feel for the architecture, the Architecture Overview (PDF) is a good place to start.

...

The availability of such paths means that every node in openEHR data is addressable using a meaningful path, opening the way for some novel possibilites in data storage, particularly relational storage.

Is openEHR a proprietary data format?

No, it's the opposite of proprietary. openEHR data are defined by the Reference Model specifications published by openEHR since 2001. These specifications define every detail of openEHR data, and are available in UML (XMI) form and XML Schema form.

But what if openEHR data are stored on a proprietary (i.e. commercial) database?

A great deal of production data (probably the majority) in the world in all industries, including health, are stored on proprietary databases such as Oracle, IBM DB2, Microsoft SQL Server. Indeed some of the openEHR vendor solutions are deployed on these databases. However, the data always follow the openEHR Reference Model, and can always be retrieved in the standard open XML Schema form, as well as the object form defined by the RM, via the EHR Service interface, whose calls return openEHR RM structures as Java, C# or other programming language objects.

This is in contrast with EMR solutions that define a proprietary schema for the database and no logical Reference Model.

What about Performance?

Regardless of what kind of persistence mechanism is chosen, performance of storage and retrieval is important, if the system is to be scalable to large numbers of users and databases accesses. Object-oriented data generally takes the form of fine-grained hierarchical structures, and openEHR data is no exception. Storing data at its finest granularity is almost guaranteed to be infeasible for scalable systems. Retrieval tests on typical object data stored in fine-grained form almost always reveal extreme inefficiency. Addressing this problem usually means storing the data at a coarser granularity, i.e. converting the fine-grained in-memory data into "blobs" and storing them instead. The questions raised by doing that include:

...

In tests at UCL on the Java implementation of openEHR, retrieval of 1 openEHR Party object stored as fine-grained objects via Hibernate over MySQL and queried by primary key took seconds; retrieval of the same object stored in a "blob" form took a few milliseconds.

...

Are openEHR data Versioned?

Yes. Versioning is a key part of the reference model. Its semantics are defined by the Common Information Model specification.

How versioning is implemented will have a major impact on the storage approach. Logically, every top-level object in openEHR is versioned, meaning that separate versions from each commit are always available. Further, the Contribution concept means that any particular commit causes a set of versions (often called a "change-set") to be committed in one go. Rolling back to previous states of the data means retrieving the state of the data at each Contribution commit point, not just at arbitrary previous points in time. See the change_control package in the Common IM\.

We also have to be mindful of the requirements of versioned openEHR data - any solution should take account of the following features of openEHR data:

...

Object databases often provide versioning. The Matisse database for example has inbuilt low-level versioning, achieved efficiently with its never-write-in-the-same-place storage approach. Such facilities on their own probably won't do what is required by openEHR, since openEHR has an explicit notion of identified versions of each top-level object. Given that the main data access need is for the latest version, it may be quite reasonable to treat the latest version as a normal database, and to manage older versions in a way that is not tightly bound to the latest version. This probably favours storing the latest version in full, and earlier versions as differencsas differences.

What are the Options for Storage?

Overview

There is a wealth of knowledge on the subject of persisting object data. One useful general reference is the Barry & Associates site, another is Scott Ambler's site. Here we try to cover just some of the major ideas, in rough order of priority. To summarise at the outset, we need to consider issues such as:

  • why not use an object database?
  • do the data need to be retrievable from the database by software written in other languages?
  • what granularity of data needs to be queryable?
  • can we use openEHR paths to help?

Object databases and persistence frameworks

One option may be to forget all about relational databases for your persistence, depending on whether you have constraints in your deployment environment on what kind of database or persistence mechanism you are allowed/encouraged to use. The attraction of object databases and other native object mechanisms is that you don't have to think too much about how your data fits the database - because there is no semantic gap between your objects and the database. If an object database or framework satisfies all the needs of the service you aim to provide then this is a good option. You have to carefully consider all your requirements and assess them against the product you are considering. Issues to consider...

  • Adding an object database to an existing environment means adding more database administration, including start, stop, backup, archive and other operations, most likely in a tool that existing sysadmin / operations staff have never seen. Make sure the overhead is acceptable. An object persistence framework probably won't be visible at all to such people.
  • Some object databases and most object persistence frameworks store data in native object form, e.g.. Java objects are stored in native binary form, only retrievable by the same software and instantiable as Java objects. This may be fine, but you need to be sure.
  • What is the finest grain of query you need to be able to do? There is probably no point in storing data smaller than this granularity.

An object persistence framework is typically a fairly lightweight library that provides: a persistence API, a method of persisting data to disk, and a smart cache. The API is typically of the form where calls like store(an_object) can be made, where an_object is the root object in a network of objects that together make up a whole top-level structure. Object persistence frameworks don't usually provide all the session management, querying, security, and transactional  power of full database systems. They may or may not be scalable to large numbers of users, in may be more oriented to client-side persistence rather than server-side persistence. Examples for Java include: db4o.

An object database on the other hand is a proper scalable and secure database management system that supports querying as well as persistence - in other words, like a relational databases system, except that it deals directly with objects rather than tables. Usually some object-flavoured SQL will be supported. Example products include Matisse (a language-neutral database with SQL querying). There are also clinical information systems based on object databases such as InterSystems Caché and Jade. Zope is a Python-based object database that is quite widely used behind active websites and has been used in health information systems, e.g. FreePM, OIO.

Object/Relational products

An object/relational (O/R) product is one that ultimately relies on an underlying relational database to store the data but does all the hard work of turning objects into relational form to write into the database. From the programmer's point of view, it may look just like an object database. The advantage of this approach is that it allows you to use an existing relational database in your environment that is already required for some other purpose. O/R products solve the problem of performing the object/relational mapping in a generic way, but they don't a priori know anything about your data. In particular, they don't know about what the patterns of querying are, where the business object boundaries are, or anything else. Some products may allow such things to be specified.

The default situation will be that using an O/R product on a typical object model over a relational database will result in numerous tables and extremely fine-grained object storage and retrieval, with the consequent performance penalty. Most likely, an O/R product will not know about business object boundaries and will do the same thing as an object database with a naively designed object model: store and retrieve everything reachable by reference-following. Avoiding these problems means at a minimum reducing the granularity of the objects being stored; see below.

Examples of object/relational products include: Apache ObJectRelationalBridge (OJB) for Java, Grails ORM and DataObjects.NET.

If instead of using default O/R framework mappings, you make sure the storage method is optimized to openEHR structures and query patterns then it is possible to get reasonable performance also in persistence solutions based on tabular formats and relational algebra, e.g. Hadoop or relational databases. The paper

Relational databases

Object data can be directly stored in a relational database, but the schema design is a greater issue. If the intention is that schema is a derivative of the object model - i.e. the "classical" approach to mapping (typical strategies) then the schema design may not be trivial. This kind of schema design is what many of the O/R tools try to automate and/or hide. However, other strategies are available, including one very interesting one which is possible due to the paths in openEHR data.

See this wiki page for an approach called 'node+path' which shows how a relational database could be used to store path-based archetyped data such as that found in openEHR.