Data Validation
Purpose
This page is dedicated to the design and discussions of archetype/template based data validation. It is crucial to implement archetype based validation correctly and consistently across different products and platform to ensure interoperability and data quality. To enable a platform independent validation of different archetype formalism implementations, it is necessary to have a common API for archetype based data validation. Hopefully this page will serve as a starting point for such fine-grained data validation API. Note that the openEHR template formalism is yet finalized at the time of writing, the focus here therefore is archetype-based data validation.
Related Documents
1. openEHR Archetype object Model, see AOM
2. openEHR Archetype Definition Language, see ADL 1.4
3. openEHR Archetype Profile, see OAP
Archetype-based Validation
Prerequisites
Remember that archetypes are always created as constraints of an underlying reference model(RM), e.g. the oepnEHR RM or 13606 RM, thus the data to be validated should really have to be valid according to the underlying RM. Normally this step is achieved by parsing a transport format, e.g. XML or dADL into in-memory object form using a valid RM implementation.
Type of Validation Errors
The possible errors from archetypes-based validation are as following:
1. Structural Errors
This category of errors are violations of structural constraints in the archetypes.
1.1 Existence Error
1.1.1 Attribute Missing
Existence (1..1) The required attribute is missing
1.1.2 Attribute Not Allowed
Existence (0..0) but attribute exists
1.2 Cardinality
1.2.1 Items Too Many
The total number of member items exceeds the higher limit specified by the cardinality constraint
1.2.2 Items Too Few
The total number of member items is less than required the lower limit of the cardinality constraint
1.2.3 Items Not Ordered
The member items are not ordered
1.2.4 Items Not Unique
Some of the member items are not unique
1.3 Occurrences
1.3.1 Occurrences Too Many
The occurrences of a type exceeds the higher limit of the occurrences constraint
1.3.2 Occurrences Too Few
The occurrences of a type is below the lower limit of the occurrences constraint
2. Leaf Data Value Errors
This category of errors are violations from leaf-level constraints on data types
2.1 Primitive Date Type Constraints
Validation errors caused by unsatisfied constraints for primitive data types.
2.1.1 C_BOOLEAN
2.1.1.1 True Invalid
Value true not allowed
2.1.1.2 False Invalid
Value false not allowed
2.1.2 C_STRING
2.1.2.1 Mismatch Pattern
String value does not match regular expression pattern
2.1.2.2 Unknown String
String value is not included in the value list and the list is exhaustive
2.1.3 C_INTEGER
2.1.3.1 Integer Too Large
Integer value is beyond the higher limit of given range
2.1.3.2 Integer Too Small
Integer value is below the lower limit of the given range
2.1.3.3 Unknown Integer
Integer value is not included in the value list of the constraint
2.1.4 C_REAL
2.1.4.1 Real Too Large
Real value is beyond the higher limit of given range
2.1.4.2 Real Too Small
Real value is below the lower limit of the given range
2.1.4.3 Unknown Real
Real value is not included in the value list of the constraint
2.1.5 C_DATE
2.1.5.1 Month Invalid
Month value is not allowed
2.1.5.2 Day Invalid
Day value is not allowed
2.1.5.3 Timezone Invalid
Timezone value is not allowed
2.1.5.4 Date Out Of Range
Date value is out of the specified range in the constraint
2.1.6 C_TIME
2.1.6.1 Minute Invalid
Minute value is not allowed
2.1.6.2 Second Invalid
Second value is not allowed
2.1.6.3 Millisecond Invalid
Millisecond value is not allowed
2.1.6.4 Timezone Invalid
Timezone value is not allowed
2.1.6.5 Time Out Of Range
Time value is out of the range specified by the constraint
2.1.7 C_DATE_TIME
2.1.7.1 Month Invalid
Month value is not allowed
2.1.7.2 Day Invalid
Day value is not allowed
2.1.7.3 Hour Invalid
Hour value is not allowed
2.1.7.4 Minute Invalid
Minute value is not allowed
2.1.7.5 Second Invalid
Second value is not allowed
2.1.7.6 Millisecond Invalid
Millisecond value is not allowed
2.1.7.7 Timezone Invalid
Timezone value is not allowed
2.1.7.8 Datetime Out Of Range
Datetime value is out of the range specified by the constraint
2.1.8 C_DURATION
2.1.8.1 Years Invalid
Years are not allowed in the constrained duration
2.1.8.2 Months Invalid
Months are not allowed in the constrained duration
2.1.8.3 Weeks Invalid
Weeks are not allowed in the constrained duration
2.1.8.4 Days Invalid
Days are not allowed in the constrained duration
2.1.8.5 Hours Invalid
Hours are not allowed in the constrained duration
2.1.8.6 Minutes Invalid
Minutes are not allowed in the constrained duration
2.1.8.7 Seconds Invalid
Seconds are not allowed in the constrained duration
2.1.8.8 Fractional Seconds Invalid
Fractional seconds are not allowed in the constrained duration
2.1.8.9 Duration Out Of Range
Duration value is out of the range specified by the constraint
2.2 Domain Data Type Constraints
Validation errors caused by unsatisfied constraints on openEHR domain data types.
2.2.1 CD_DV_STATE
2.2.1.1 Unknown State
The state is unknown to the specified state-machine.
2.2.1.2 Unknown Transition
The transition between two states are not supported by the specified state-machine.
2.2.2 C_CODE_PHRASE
2.2.2.1 Unknown Terminology
The terminology id does not match the given terminology id in c_code_phrase
2.2.2.2 Unknown Code
The code is not included in the code_list of c_code_phrase
2.2.3 C_DV_ORDINAL
2.2.3.1 Unknown Ordinal
The ordinal value is not allowed by specified ordinal value list given by c_dv_ordinal
2.2.4 C_DV_QUANTITY
2.2.4.1 Invalid Magnitude
The magnitude is outside the range specified by c_dv_quantity
2.2.4.2 Invalid Precision
The precision is outside the range specified by c_dv_quantity
2.2.4.3 Invalid Units
The units does not match the units specified by c_dv_quantity
3. Ontological Errors
This category of errors are related to term definitions and constraint definitions in the archetypes.
3.1 Term Definition Errors
3.1.1 Incorrect Name
Incorrect name according to the term definition for given language.
3.2 Term Constraint Errors
3.1.1 Incorrect Term
The term cannot satisfy the specified terminology query in the term constraint definition.
Design of the Validation API
It is obvious the entry point of such validation API would need to know the instance of data already expressed in the RM and the archetype used to validate the instance. Since the data instance is already a hierarchical tree of objects starting from a root object, the validation should always start from the top level. Consequently it is likely it is the structural error that the validator will encounter in the beginning of the valiation. As validation proceeds, errors due to violation of leaf-level data value constraints or ontological constraints will raise. Any structural error should prevent further validation on the children objects of particular object where the error occurs. Naturally when an error occurs, it is important to understand 1) where in the data instance it happens; 2) based on which constraints of the archetype the validation is performed; 3) the type of the error and any further specific information on the validation error, e.g. was the value too large or too small according to the constraint. The location of any item in the data instance can be achieved by runtime path. Similarly the location of archetype node can be achieved by archetype path. Different errors would need different strucutre to hold relevant information but should be organized to facilitate error report. Based on these assumption and reasoning the following validation API is proposed (in near Java syntax for clarity):
Validation API
Class Validator
This is the entry point for this validation API, the return type is a list of validation error instances. If the list is empty after validation, it means the data instance is valid according to given archetype.
Interface Validator
Class ValidationError
This is the root class for all concrete validator error subclasses. Any sub validation error classes are free to have extra attribute to hold releveant informatino for specific validation.
Abstrct Class ValidationError
Implementation
Initial implementation of this design is hosted at the SANDBOX area of the openEHR Java project.
http://www.openehr.org/svn/ref_impl_java/SANDBOX/rm-validator/