Data Validation

Purpose

This page is dedicated to the design and discussions of archetype/template based data validation. It is crucial to implement archetype based validation correctly and consistently across different products and platform to ensure interoperability and data quality. To enable a platform independent validation of different archetype formalism implementations, it is necessary to have a common API for archetype based data validation. Hopefully this page will serve as a starting point for such fine-grained data validation API. Note that the openEHR template formalism is yet finalized at the time of writing, the focus here therefore is archetype-based data validation.

Related Documents

1. openEHR Archetype object Model, see AOM

2. openEHR Archetype Definition Language, see ADL 1.4

3. openEHR Archetype Profile, see OAP

Archetype-based Validation

Prerequisites

Remember that archetypes are always created as constraints of an underlying reference model(RM), e.g. the oepnEHR RM or 13606 RM, thus the data to be validated should really have to be valid according to the underlying RM. Normally this step is achieved by parsing a transport format, e.g. XML or dADL into in-memory object form using a valid RM implementation.

Type of Validation Errors

The possible errors from archetypes-based validation are as following:

1. Structural Errors

This category of errors are violations of structural constraints in the archetypes.

1.1 Existence Error

1.1.1 Attribute Missing

Existence (1..1) The required attribute is missing

1.1.2 Attribute Not Allowed

Existence (0..0) but attribute exists

1.2 Cardinality 

1.2.1 Items Too Many

The total number of member items exceeds the higher limit specified by the cardinality constraint

1.2.2 Items Too Few

The total number of member items is less than required the lower limit of the cardinality constraint

1.2.3 Items Not Ordered

The member items are not ordered

1.2.4 Items Not Unique

Some of the member items are not unique

1.3 Occurrences

1.3.1 Occurrences Too Many

The occurrences of a type exceeds the higher limit of the occurrences constraint

1.3.2 Occurrences Too Few

The occurrences of a type is below the lower limit of the occurrences constraint

2. Leaf Data Value Errors

 This category of errors are violations from leaf-level constraints on data types

2.1 Primitive Date Type Constraints

Validation errors caused by unsatisfied constraints for primitive data types.

2.1.1 C_BOOLEAN

2.1.1.1 True Invalid

Value true not allowed

2.1.1.2 False Invalid

Value false not allowed

2.1.2 C_STRING

2.1.2.1 Mismatch Pattern

String value does not match regular expression pattern

2.1.2.2 Unknown String

String value is not included in the value list and the list is exhaustive

2.1.3 C_INTEGER

2.1.3.1 Integer Too Large

Integer value is beyond the higher limit of given range

2.1.3.2 Integer Too Small

Integer value is below the lower limit of the given range

2.1.3.3 Unknown Integer

Integer value is not included in the value list of the constraint

2.1.4 C_REAL

2.1.4.1 Real Too Large

Real value is beyond the higher limit of given range

2.1.4.2 Real Too Small

Real value is below the lower limit of the given range

2.1.4.3 Unknown Real

Real value is not included in the value list of the constraint

2.1.5 C_DATE

2.1.5.1 Month Invalid

Month value is not allowed

2.1.5.2 Day Invalid

Day value is not allowed

2.1.5.3 Timezone Invalid

Timezone value is not allowed

2.1.5.4 Date Out Of Range

Date value is out of the specified range in the constraint

2.1.6 C_TIME

2.1.6.1 Minute Invalid

Minute value is not allowed

2.1.6.2 Second Invalid

Second value is not allowed

2.1.6.3 Millisecond Invalid

Millisecond value is not allowed

2.1.6.4 Timezone Invalid

Timezone value is not allowed

2.1.6.5 Time Out Of Range

Time value is out of the range specified by the constraint

2.1.7 C_DATE_TIME

2.1.7.1 Month Invalid

Month value is not allowed

2.1.7.2 Day Invalid

Day value is not allowed

2.1.7.3 Hour Invalid

Hour value is not allowed

2.1.7.4 Minute Invalid

Minute value is not allowed

2.1.7.5 Second Invalid

Second value is not allowed

2.1.7.6 Millisecond Invalid

Millisecond value is not allowed

2.1.7.7 Timezone Invalid

Timezone value is not allowed

2.1.7.8 Datetime Out Of Range

Datetime value is out of the range specified by the constraint

2.1.8 C_DURATION

2.1.8.1 Years Invalid

Years are not allowed in the constrained duration

2.1.8.2 Months Invalid

Months are not allowed in the constrained duration

2.1.8.3 Weeks Invalid

Weeks are not allowed in the constrained duration

2.1.8.4 Days Invalid

Days are not allowed in the constrained duration

2.1.8.5 Hours Invalid

Hours are not allowed in the constrained duration

2.1.8.6 Minutes Invalid

Minutes are not allowed in the constrained duration

2.1.8.7 Seconds Invalid

Seconds are not allowed in the constrained duration

2.1.8.8 Fractional Seconds Invalid

Fractional seconds are not allowed in the constrained duration

2.1.8.9 Duration Out Of Range

Duration value is out of the range specified by the constraint

2.2 Domain Data Type Constraints

Validation errors caused by unsatisfied constraints on openEHR domain data types.

2.2.1 CD_DV_STATE

2.2.1.1 Unknown State

The state is unknown to the specified state-machine.

2.2.1.2 Unknown Transition

The transition between two states are not supported by the specified state-machine.

2.2.2 C_CODE_PHRASE

2.2.2.1 Unknown Terminology

The terminology id does not match the given terminology id in c_code_phrase

2.2.2.2 Unknown Code

The code is not included in the code_list of c_code_phrase

2.2.3 C_DV_ORDINAL

2.2.3.1 Unknown Ordinal

The ordinal value is not allowed by specified ordinal value list given by c_dv_ordinal

2.2.4 C_DV_QUANTITY

2.2.4.1 Invalid Magnitude

The magnitude is outside the range specified by c_dv_quantity

2.2.4.2 Invalid Precision

The precision is outside the range specified by c_dv_quantity

2.2.4.3 Invalid Units

The units does not match the units specified by c_dv_quantity

3. Ontological Errors

This category of errors are related to term definitions and constraint definitions in the archetypes.

3.1 Term Definition Errors

3.1.1 Incorrect Name

Incorrect name according to the term definition for given language.

3.2 Term Constraint Errors

3.1.1 Incorrect Term

The term cannot satisfy the specified terminology query in the term constraint definition.

Design of the Validation API

It is obvious the entry point of such validation API would need to know the instance of data already expressed in the RM and the archetype used to validate the instance. Since the data instance is already a hierarchical tree of objects starting from a root object, the validation should always start from the top level. Consequently it is likely it is the structural error that the validator will encounter in the beginning of the valiation. As validation proceeds, errors due to violation of leaf-level data value constraints or ontological constraints will raise. Any structural error should prevent further validation on the children objects of particular object where the error occurs. Naturally when an error occurs, it is important to understand 1) where in the data instance it happens; 2) based on which constraints of the archetype the validation is performed; 3) the type of the error and any further specific information on the validation error, e.g. was the value too large or too small according to the constraint. The location of any item in the data instance can be achieved by runtime path. Similarly the location of archetype node can be achieved by archetype path. Different errors would need different strucutre to hold relevant information but should be organized to facilitate error report. Based on these assumption and reasoning the following validation API is proposed (in near Java syntax for clarity):

Validation API

Class Validator

This is the entry point for this validation API, the return type is a list of validation error instances. If the list is empty after validation, it means the data instance is valid according to given archetype.

Interface Validator

Unknown macro: {     Listvalidate(RMObject dataInstance, Archetype archetype); }


Class ValidationError

This is the root class for all concrete validator error subclasses. Any sub validation error classes are free to have extra attribute to hold releveant informatino for specific validation.

Abstrct Class ValidationError

Unknown macro: {     String pathToItem; // runtime path to data item where error occurs     String pathToArchetypeNode; // archetype path to the node where the constraint is specified }

Implementation

Initial implementation of this design is hosted at the SANDBOX area of the openEHR Java project.

http://www.openehr.org/svn/ref_impl_java/SANDBOX/rm-validator/