h1. *Purpose*

This page is dedicated to the design and discussions of archetype/template based data validation. It is crucial to implement archetype based validation correctly and consistently across different products and platform to ensure interoperability and data quality. To enable a platform independent validation of different archetype formalism implementations, it is necessary to have a common API for archetype based data validation. Hopefully this page will serve as a starting point for such fine-grained data validation API. Note that the openEHR template formalism is yet finalized at the time of writing, the focus here therefore is archetype-based data validation.

h2. *Related Documents*

*1.* *{_}open{_}{*}{*}EHR Archetype object Model*, see [AOM|http://www.openehr.org/svn/specification/TAGS/Release-1.0.1/publishing/architecture/am/aom.pdf]

*2.* *{_}open{_}{*}{*}EHR Archetype Definition Language*, see [ADL 1.4|http://www.openehr.org/svn/specification/TAGS/Release-1.0.1/publishing/architecture/am/adl.pdf]

*3.* *{_}open{_}{*}{*}EHR Archetype Profile*, see [OAP|http://www.openehr.org/svn/specification/TAGS/Release-1.0.1/publishing/architecture/am/openehr_archetype_profile.pdf]

h1. *Archetype-based Validation*


h2. *Prerequisites*

Remember that archetypes are always created as constraints of an underlying reference model(RM), e.g. the oepnEHR RM or 13606 RM, thus the data to be validated should really have to be valid according to the underlying RM. Normally this step is achieved by parsing a transport format, e.g. XML or dADL into in-memory object form using a valid RM implementation.

h2. *Type of Validation Errors*

The possible errors from archetypes-based validation are as following:

h3. *1.* Structural Errors

This category of errors are violations of structural constraints in the archetypes.

h4. *1.1 Existence Error*

*1.1.1 Attribute Missing*

Existence (1..1) The required attribute is missing

*1.1.2 Attribute Not Allowed*

Existence (0..0) but attribute exists

h4. *1.2 Cardinality *

*1.2.1 Items Too Many*

The total number of member items exceeds the higher limit specified by the cardinality constraint

*1.2.2 Items Too Few*

The total number of member items is less than required the lower limit of the cardinality constraint

*1.2.3 Items Not Ordered*

The member items are not ordered

*1.2.4 Items Not Unique*

Some of the member items are not unique

h4. 1.3 Occurrences

*1.3.1 Occurrences Too Many*

The occurrences of a type exceeds the higher limit of the occurrences constraint

*1.3.2 Occurrences Too Few*

The occurrences of a type is below the lower limit of the occurrences constraint
\\

h3. 2. Leaf Data Value Errors

 This category of errors are violations from leaf-level constraints on data types

h4. 2.1 Primitive Date Type Constraints

Validation errors caused by unsatisfied constraints for primitive data types.

h5. *2.1.1 C_BOOLEAN*

*2.1.1.1 True Invalid*

Value true not allowed

*2.1.1.2 False Invalid*

Value false not allowed

h5. *2.1.2 C_STRING*

*2.1.2.1 Mismatch Pattern*

String value does not match regular expression pattern

*2.1.2.2 Unknown String*

String value is not included in the value list and the list is exhaustive

h5. 2.1.3 C_INTEGER

*2.1.3.1 Integer Too Large*

Integer value is beyond the higher limit of given range

*2.1.3.2 Integer Too Small*

Integer value is below the lower limit of the given range

*2.1.3.3 Unknown Integer*

Integer value is not included in the value list of the constraint

h5. *2.1.4* *C_REAL*

*2.1.4.1 Real Too Large*

Real value is beyond the higher limit of given range

*2.1.4.2 Real Too Small*

Real value is below the lower limit of the given range

*2.1.4.3 Unknown Real*

Real value is not included in the value list of the constraint

h5. *2.1.5* *C_DATE*

*2.1.5.1 Month Invalid*

Month value is not allowed

*2.1.5.2 Day Invalid*

Day value is not allowed

*2.1.5.3 Timezone Invalid*

Timezone value is not allowed

*2.1.5.4 Date Out Of Range*

Date value is out of the specified range in the constraint

h5. *2.1.6* *C_TIME*

*2.1.6.1 Minute Invalid*

Minute value is not allowed

*2.1.6.2 Second Invalid*

Second value is not allowed

*2.1.6.3 Millisecond Invalid*

Millisecond value is not allowed

*2.1.6.4 Timezone Invalid*

Timezone value is not allowed

*2.1.6.5 Time Out Of Range*

Time value is out of the range specified by the constraint

h5. *2.1.7* *C_DATE_TIME*

*2.1.7.1 Month Invalid*

Month value is not allowed

*2.1.7.2 Day Invalid*

Day value is not allowed

*2.1.7.3 Hour Invalid*

Hour value is not allowed

*2.1.7.4 Minute Invalid*

Minute value is not allowed

*2.1.7.5 Second Invalid*

Second value is not allowed

*2.1.7.6 Millisecond Invalid*

Millisecond value is not allowed

*2.1.7.7 Timezone Invalid*

Timezone value is not allowed

*2.1.7.8 Datetime Out Of Range*

Datetime value is out of the range specified by the constraint

h5. *2.1.8* *C_DURATION*

*2.1.8.1 Years Invalid*

Years are not allowed in the constrained duration

*2.1.8.2 Months Invalid*

Months are not allowed in the constrained duration

*2.1.8.3 Weeks Invalid*

Weeks are not allowed  in the constrained duration

*2.1.8.4 Days Invalid*

Days are not allowed in the constrained duration

*2.1.8.5 Hours Invalid*

Hours are not allowed in the constrained duration

*2.1.8.6 Minutes Invalid*

Minutes are not allowed in the constrained duration

*2.1.8.7 Seconds Invalid*

Seconds are not allowed in the constrained duration

*2.1.8.8 Fractional Seconds Invalid*

Fractional seconds are not allowed in the constrained duration

*2.1.8.9 Duration Out Of Range*

Duration value is out of the range specified by the constraint

h4. 2.2 Domain Data Type Constraints

Validation errors caused by unsatisfied constraints on openEHR domain data types.

h5. *2.2.1 CD_DV_STATE*

*2.2.1.1 Unknown State*

The state is unknown to the specified state-machine.

*2.2.1.2 Unknown Transition*

The transition between two states are not supported by the specified state-machine.

h5. *2.2.2* *C_CODE_PHRASE*

*2.2.2.1 Unknown Terminology*

The terminology id does not match the given terminology id in c_code_phrase

*2.2.2.2 Unknown Code*

The code is not included in the code_list of c_code_phrase

h5. *2.2.3* *C_DV_ORDINAL*

*2.2.3.1 Unknown Ordinal*

The ordinal value is not allowed by specified ordinal value list given by c_dv_ordinal

h5. *2.2.4* *C_DV_QUANTITY*

*2.2.4.1 Invalid Magnitude*

The magnitude is outside the range specified by c_dv_quantity

*2.2.4.2 Invalid Precision*

The precision is outside the range specified by c_dv_quantity

*2.2.4.3 Invalid Units*

The units does not match the units specified by c_dv_quantity

h4. 3. Ontological Errors

This category of errors are related to term definitions and constraint definitions in the archetypes.

h4. 3.1 Term Definition Errors

*3.1.1 Incorrect Name*

Incorrect name according to the term definition for given language.

h4. 3.2 Term Constraint Errors

*3.1.1 Incorrect Term*

The term cannot satisfy the specified terminology query in the term constraint definition.

h2. Design of the Validation API

It is obvious the entry point of such validation API would need to know the instance of data already expressed in the RM and the archetype used to validate the instance. Since the data instance is already a hierarchical tree of objects starting from a root object, the validation should always start from the top level. Consequently it is likely it is the structural error that the validator will encounter in the beginning of the valiation. As validation proceeds, errors due to violation of leaf-level data value constraints or ontological constraints will raise. Any structural error should prevent further validation on the children objects of particular object where the error occurs. Naturally when an error occurs, it is important to understand 1) where in the data instance it happens; 2) based on which constraints of the archetype the validation is performed; 3) the type of the error and any further specific information on the validation error, e.g. was the value too large or too small according to the constraint. The location of any item in the data instance can be achieved by runtime path. Similarly the location of archetype node can be achieved by archetype path. Different errors would need different strucutre to hold relevant information but should be organized to facilitate error report. Based on these assumption and reasoning the following validation API is proposed (in near Java syntax for clarity):

*Validation API*

h4. Class Validator

This is the entry point for this validation API, the return type is a list of validation error instances. If the list is empty after validation, it means the data instance is valid according to given archetype.

Interface Validator
{       Listvalidate(RMObject dataInstance, Archetype archetype);  }\\

h4. Class ValidationError

This is the root class for all concrete validator error subclasses. Any sub validation error classes are free to have extra attribute to hold releveant informatino for specific validation.

Abstrct Class ValidationError
{      String pathToItem; // runtime path to data item where error occurs      String pathToArchetypeNode; // archetype path to the node where the constraint is specified  }

h1. Implementation

Initial implementation of this design is hosted at the SANDBOX area of the openEHR Java project.


http://www.openehr.org/svn/ref_impl_java/SANDBOX/rm-validator/
\\