DV_EHR_URI related issues

 

Work in progress - This page is a placeholder for some analysis on DV_EHR_URI

Goal: present some coherent suggestions to the openEHR SEC.

Log:

  • Work started by @Sebastian Iancu and @Erik Sundvall during Zoom meeting 2020-06-23

  • Work continued by @Sebastian Iancu and @Erik Sundvall during Zoom meeting: 2020-07-21 and chat etc afterwards

  • Updated at 2022-05-16 by @Erik Sundvall to cover changes from “demographic” to “registry” based on forum discussions. Also added urn:openehr:class etc

  • Currently, as the name suggests, EHR_URI only points to EHR object hierarchies not to other classes (for example in the Demographic package, or Task Planing), although the initial intention was to point to an openEHR (archetyped) object. Also the descriptions in the architecture overview are very EHR-centric.

  • The RM package “Common” is supposed to be common for many things, including the “Demographic” package. Currently it is too ‘focused’ on EHRs by using DV_EHR_URI (for example in ATTESTATION and LOCATABLE)

  • The current ehr:-scheme is not registered at IANA. If somebody else later registers it for other services/purposes, then those openEHR systems that need to use that new service will face the problem of having the same scheme meaning two different things. Registering top level URI schemes like ehr: at IANA might work, but might be blocked by others due to being seen as a bit too generic, especially if we try to introduce and register new schemes like demographic: , registry: or tasks:.

  • The design of the current ehr:-scheme has some extensibility problems if it is to be used for wider purposes (such as referencing Demographic objects or Task Planning) – it is currently designed to primarily have EHR IDs as top level objects (and assuming the current EHR ID and system as default if omitted):

    • ehr://system_id/ehr_id/top_level_structure_locator/path_inside_top_level_structure

    • Potentially suffixing it like ehr:demographic/, ehr:registry/ or similar might work but may also get messy in implementation.

  • LOCATABLE.links can only point to openEHR objects, via LINK.target: DV_EHR_URI, not allowing external URIs e.g. OWL ontology nodes and other semantic web content using RDF-style “predicate” (link type) and “object” (link target) URIs it would be easier to reuse such resources when detailing LOCATABLE nodes in archetypes, templates and at runtime than going through a hoop of DV_TEXT.mappings.target.code_string-constructs or similar. Some of the suggestions below automatically define URNs for any openEHR LOCATABLE node that thus can serve as a RDF “subject” (origin) in outgoing RDF links (and implicitly also as a target “object” for RDF triplets pointing to it in other systems). Example: a certain ENTRY, COMPOSITION or FOLDER instance could represent or be related to (have LINK relations to) a https://contsys.org/concept/health_issue or other more local things modelled at the template level, without messing with the internationally designed and shared archetypes.
    [No solution for this last problem point is provided below -may need to come after release 1.1.]

Use case examples

  • Code24 via @Sebastian Iancu

    • uses LINKs to demographic objects for locations like hospital rooms or beds and to Admission compositions

    • example demographic://e23ccf0f-8dfb-43c2-beb6-b3f5acf14aef

  • Better LINK URI examples, via @Matija Polajnar:

    • ehr:work-plans/4ff64a96-7800-4fab-9076-776dc55cb743

    • ehr:tasks/380daa09-028f-4beb-9803-4aef91644c2a

    • tp:m-tasks/b726cd9f-aa1e-454f-93fd-575c4b05e30c

  • DIPS LINK URI via @Bjørn Næss:

    • ehr:compositions/4ff64a96-7800-4fab-9076-776dc55cb743??

    • Runtime TP Work Plans - not in EHR space.

Hypothesis: URNs might be a good fit for many openEHR URI targets

One option to make registration at IANA easier is to prefix schemas with openehr to make them more specific and reduce risks for conflicting names, so openehr+ehr: and openehr+demographic: could be registered and

  • either used instead of the current ehr: scheme - N.B.: breaking change in data requiring special handling!

  • or ehr: and demographic:could be considered for internal use and but it will need to be converted to openehr+ehr: and openehr+demographic: for system external use, such as invoking locally registered apps on handling these links.

A second option is to properly register an openehr: schema and make it more extensible by letting the path begin with things like openehr:ehr namespace - breaking change in data.

A third option (main suggestion) is to use URNs. A good thing with the current ehr scheme is that it mainly identifies content by more persistent IDs rather than system/location-dependent like a http-scheme usually is used for. URNs (see https://tools.ietf.org/html/rfc8141) are designed for similar identification purposes. Syntax of an URN is <URN> ::= "urn:" <NID> ":" <NSS>

Shifting over to URNs (at least for external use) could have some advantages:

  • Handling URNs may feel more familiar than new schemes to developers/integrators and others.

  • Once openEHR is registered as governing the Namespace Identifier (NID) anything under urn:openehr: can be used and governed by openEHR without requiring further registration

  • URNs can be fairly short and still recognized as globally unique identifiers by people (without having to learn about a new internet scheme)

  • Several other standardization organisations (SDOs) have registered URN namespaces and there is somewhat of a “fast-track” for SDOs and i believe openEHR qualifies as one in this case.

If URNs can be designed to cover the same use cases as the current ehr:-scheme and simple two-way conversions can be done, then there is no need for openEHR-specific schemas to be used or exposed outside openEHR systems.

An openEHR urn NID can also be used for identifying many other openEHR things like specifications, archetypes etc e.g. by mapping to other openEHR naming schemes. So we may have things like:

  • urn:openehr: the “root” of the suggested openEHR “namespace”

  • urn:openehr:ehr: = prefix for paths to data instances currently using DV_EHR_URI and the current ehr:-schema (below suggested new class DV_OPENEHR_URI )

  • urn:openehr:registry: = prefix for paths to data instances outside the “EHR”-hierarchy and thus not covered by the current DV_EHR_URI. The ‘registry’ name is based on forum discussions (widened scope from the previous suggestion urn:openehr:demographic: )

  • urn:openehr:tp: or urn:openehr:proc:task: or similar

  • urn:openehr:archetype:… to define a specific archetype using the specification.

    • Example urn:openehr:archetype:openEHR-EHR-OBSERVATION.laboratory_test_result.v1 (taken from discussion in )

  • urn:openehr:template:…

Suggestions

Figure 1: Condensed UML-ish sketch of changes to classes. Classes and methods marked (?) are not necessarily needed, especially not in release 1.1, and thus open for discussion. (Scroll further down for diagram yuml-source code)

Introduce new DV_OPENEHR_URI class in the hierarchy between DV_URI and DV_EHR_URI (as suggested in Thomas Beale's comment in ) see figure 1 above.

  1. Embrace the fact that current openEHR URIs are not syntactically valid URIs and instead introduce a as_urn method that can produce valid URIs in the form of suitably encoded URNs.

    • The invariant condition of DV_EHR_URI.value can still require the string to start with “ehr:“.

    • DV_OPENEHR_URI and DV_EHR_URI could both allow unencoded spaces, square brackets etc. in the value attribute to make paths more readable. But note that the “URI” value property can only be used inside openEHR systems, it will often not be a legal URI according to the official URI specification.

    • Perhaps there should not even be an inheritance relation between DV_URI and DV_OPENEHR_URI (and thus implicitly by transitivity not between DV_URI and DV_EHR_URI either) since the internal openEHR ones don’t follow all the official URI rules? Perhaps that is a breaking change and should just be noted and not changed until 2.0? Then also perhaps hint to implementers that using a programming language’s built in URI class as base class for DV_OPENEHR_URI and DV_EHR_URI often may be a bad idea if it contains syntax checks.

  2. Pick one of the following behaviors for the value attribute and add to specification (please add any more alternatives, pros and cons you may find):

    1. Alternative A (main suggestion): Internally in openEHR systems, pretend that there are (not officially registered) schemes for everything (ehr:, registry: etc.) and put that form in the value attribute as now (plus new registry: scheme etc), but in future external APIs and serialization formats that may also need to refer to system-external URIs, use the as_urn method for serialization. (Future semantic web formats like RDF-variants and OWL-related representations of openEHR content are examples where the urn:openehr-form should be used.)
      In documentation clearly point out that DV_OPENEHR_URI and DV_EHR_URIs are only meant for system internal use.

      1. Pros:

        • Having internal schemes can lead to shorter more readable strings in AQL and other formalisms and openEHR internal representations.

      2. Cons:

        • To not cause future name-clashing problems with other (possibly officially registered) schemes, we need to be disciplined regarding usage classes and aware of that a DV_OPENEHR_URI and a DV_URI with the same scheme (and possibly entire URI) in the value attribute may refer to completely different things.

        • We may need a growing numer of classes like OPENEHR_registry_URI if we want to be consistent with DV_EHR_URI in design?

      3. Alternative B: put the officially registered URN-form of references in the value attribute of DV_OPENEHR_URI. Encourage use of DV_OPENEHR_URI for new development.
        In DV_EHR_URI still use the ehr: scheme-form in the value attribute (for legacy reasons) but in future external APIs and serialisation formats use the as_urn method for serialisation and upcast class to DV_OPENEHR_URI (?). And in documentation point out that DV_EHR_URI is only meant for system internal use. (Also possibly

        1. Pros:

          • DV_OPENEHR_URI are just like any DV_URI with the addition of an invariant condition that checks that the value attribute starts with “urn:openehr:“.

          • Very clean in serialization/deserialization process

          • No new schemes need to be added and maintained

          • DV_EHR_URI can be deprecated and later removed

        2. Cons:

          • Can lead to longer less readable strings in EHR data, AQL and other formalisms and openEHR internal representations (urn:openehr:registry: is longer than registry:).

          • Due to inheritance we probably need invariant conditions to allow value strings of DV_OPENEHR_URI to start with both “ehr:“ and with “urn:openehr”, right? (Hope not.) That means we risk getting two syntactic ways to point to the same EHR content in the DV_OPENEHR_URI.value field. Likely means extra code and conversions for handling two forms value for of uris to EHRs

          • Requires disciplined use of class information and a string conversion of the value field when casting between DV_EHR_URI and DV_OPENEHR_URI

  3. We need to decide if the invariant condition of DV_OPENEHR_URI.value should require strings to

    1. start only with a list of defined schemes like “ehr:“ and “registry:” (or in Alternative 1.B a list of “urn:openehr”-suffixes) and update spec as new uses are found.

    2. or if it instead should be left unrestricted.

  4. (Optional) Mark DV_EHR_URI for deprecation now in v 1.1, then remove or make read-only in 2.0 (since it may be a breaking change)?

  5. Change references to DV_EHR_URI into references to DV_OPENEHR_URI in the following classes:

    1. ATTESTATION: DV_EHR_URI change to DV_OPENEHR_URI

    2. LINK: DV_EHR_URI change to DV_OPENEHR_URI (possibly also allow DV_URI)
      [ TODO: discuss pointiners to external non uris]

    3. Add a method/function for retrieving the (canonical?) URN (see below) from DV_OPENEHR_URI objects (and subclasses)

  6. Start the process of formally registering URN namespace identifier (NID) for openEHR, NID = “openehr” (lowercase recommended in the RFC). This would enable URNs starting with urn:openehr

  7. Define paths within registry

    1. If Alternative 1a was chosen above:
      possibly design a scheme modeled after the ehr: one
      ehr:[//system_id/]top_level/path_inside_top_level

      1. defaulting to current system if

    2. Design URI NSS for registry

      1. registry:[//system_id/]top_level/path_inside_top_level

  8. Define, publish and manage a set of namespace-specific strings (NSS) and string patterns for suitable openEHR targets/resources for example something like the following (but just start with urgent needs like “party” for release 1.1) the rest can come later

    1. urn:openehr:ehr as discussed above

    2. urn:openehr:registry as discussed above

    3. urn:openehr:terminology: (or urn:openehr:term: or similar?) openEHR support terminology etc

    4. urn:openehr:spec for specification documents/files (inspired by https://tools.ietf.org/html/rfc2648) optionally including f-components pointing to specific sections using the same #fragment-names as in the current html specs.

    5. urn:openehr:class: prefix used as a way to identify specific RM and AM classes and optionally their contained attributes when they need to be in URIform. The fact that the class names are unique makes URIs short. The classes (CAPITALISED) and attributes (snake_case) should be written exactly as in the specification and a period (.) used to separate attribute from class.

      • Example: in a FHIR profile of the Patient resource we may want to use an openEHR ehr_id as one of the patient indentifiers and to separate it from other types of identifiers we can set the FHIR Identifier.system attribute to urn:openehr:class:EHR.ehr_id

      • Possible later extension to be discussed: How to refer to nested complex objects if needed, for example the ehr_id above is of type HIER_OBJECT_ID that inherits the attribute value from OBJECT_ID, so should we also allow deeper paths like urn:openehr:class:EHR.ehr_id.value or a syntax like urn:openehr:class:EHR.ehr_id/value

    6. Event types could be represented by URNs for example either

      1. something like urn:openehr:event:...

      2. or something based on methods of an exising class definitons likeurn:openehr:class:VERSIONED_OBJECT.commit_original_version

    7. urn:openehr:am or urn:openehr:archetype (or something else) prefixing the naming rules in https://specifications.openehr.org/releases/AM/latest/Identification.html. Examples:

      1. urn:openehr:am:org.openehr::openEHR-EHR-EVALUATION.diagnosis.v1.1.7 possibly with paths pointing to parts inside an archetype

    8. Do we need a uri (urn like urn:openehr:system?) to identify systems (that e.g. contain EHR and registry objects)? Maybe not? But if so:

      1. We likely do want to separate (more permanent) logical system identifiers from physical http-addresses (e.g. REST instance endpoints)

      2. What would a system URI (URN?) look like? urn:openehr:system:rmh.nhs.net for hierarchical and urn:openehr:system:f7877ee0-cb4b-11ea-87d0-0242ac130003 for uuid-based system names?

      3. In addition to the system itself, what (if anything) should be possible to identify within a system and thus possibly append to such uris?

        1. system specific instances of EHR content, Parties, Task plans etc?

          1. urn:openehr:system:rmh.nhs.net:ehr:/347a5490-55ee-4da9-b91a-9bba710f730e/compositions/87284370-2D4B-4e3d-A3F3-F303D2F4F34B::rmh.nhs.net::2
            …is kind of duplicating the current ehr-scheme ehr://rmh.nhs.net/347a5490-55ee-4da9-b91a-9bba710f730e/

          2. urn:openehr:system:rmh.nhs.net:party:...

  9. Create (bidirectional?) encoding algorithms between existing openEHR “ehr:”-scheme and urn:openehr:ehr URNs, at least for pointing out specific VERSIONs and VERSIONED_OBJECTs and paths within those. Thus the ehr: scheme and possible other future schemes can be used inside openEHR systems.

    1. Scheme → URN can probably be done by simply

      1. prefixing the scheme with urn:openehr,

      2. convert possible URI queries (?...) to URN q-components
        [URI queries are not used in the ehr:-scheme currently]

      3. convert fragments (#... ) to URN f-components
        [URI fragments are not used in the ehr:-scheme currently]

      4. Encode URI-invalid characters

      5. Simple example (without queries and fragments):
        urn:openehr:ehr:/347a5490-55ee-4da9-b91a-9bba710f730e/compositions/87284370-2D4B-4e3d-A3F3-F303D2F4F34B::rmh.nhs.net::2

    2. IF we need to point out that we mean the instance stored in a specific system (often a bad idea in DV_EHR_URI links), then the double-slash system-id can be used

    3. TODO: finish investigation of when we need to encode /, [, ] and spaces etc to get valid URNs

      1. OpenEHR paths are potentially full of characters that are forbidden in a URN NSS. Examples:

        1. /content[openEHR-EHR-SECTION.vital_signs.v1 and name/value='Vital signs']/items[openEHR-EHR-OBSERVATION.heart_rate-pulse.v1 and name/value='Pulse']/data/events[at0003 and name/value='Any event']/data/items[at1005]

        2. /data/events[at0001, 'standing']

        3. /data/events[at0007 AND time >= '24-06-2005T09:30:00']

        4. ehr:/347a5490-55ee-4da9-b91a-9bba710f730e/compositions/87284370-2D4B-4e3d-A3F3-F303D2F4F34B/content[openEHR-EHR-SECTION.vital_signs.v1]/items[openEHR-EHR-OBSERVATION.heart_rate-pulse.v1]/data/events[at0006, 'any event']/data/items[at0004]

        5. ehr:compositions/87284370-2D4B-4e3d-A3F3-F303D2F4F34B/content[openEHR-EHR-SECTION.vital_signs.v1]
          For description and more examples see: https://specifications-test.openehr.org/releases/BASE/latest/architecture_overview.html#_paths_and_locators)
          TODO: report/fix error in Archetype Overview examples above 'any event' vs 'Any event'

      2. The NSS part of a URN has the following character rules
        https://tools.ietf.org/html/rfc8141#section-2
        NSS = pchar *(pchar / "/")
        and ”pchar” is described in https://tools.ietf.org/html/rfc3986#appendix-A as
        pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
        unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
        pct-encoded = "%" HEXDIG HEXDIG
        sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

      3. Good news: /-._',= are allowed

      4. Bad news: [ ]>< are forbidden (note that space is one of the forbidden characters, it needs to be encoded)

      5. Can we assume that

        1. the only place where any unicode character (e.g. non-English ones) will surface in an openEHR path is within single quotes?

        2. comparisons with < or > like in [at0007 AND time >= '24-06-2005T09:30:00'] will only be present in AQL etc, not in DV_OPENEHR_URI and thus not in the openehr URNs?

        3. + is never used in paths outside single-quote delimited strings

      6. If the assumptions above hold true then one encoding option is

        1. Percent encode anything found inside single quotes

        2. replace [ and ] with ( and ) (that has been discussed earlier in the SEC- TODO: find link to that discussion)

        3. replace spaces found outside single quotes with + (possibly first define and enforce a rule regarding how space is used in paths for example are the following equivalent and what should the canonical form that is used for URNs be? If using alternative c below as canonical then we avoid some extra + signs in urns and paths but alternative a may be easier to read in deserialised form.

          1. /data/events[at0001, 'standing'] (used in example below)

          2. /data/events[at0001, 'standing'] (extra spaces)

          3. /data/events[at0001,'standing']

      7. If all the serialization rules above are applied to the previous examples, the resulting encoded versions will look like below. For the last two examples (4 & 5) that are valid DV_EHR_URI values the urn scheme and proposed openEHR NID have also been added to form a complete URN

        1. /content(openEHR-EHR-SECTION.vital_signs.v1+and+name/value='Vital%20signs')/items(openEHR-EHR-OBSERVATION.heart_rate-pulse.v1+and+name/value='Pulse')/data/events(at0003+and+name/value='Any%20event')/data/items(at1005)

        2. /data/events(at0001,+'standing')

        3. /data/events(at0007+AND+time+>=+'24-06-2005T09%3A30%3A00')

        4. urn:openehr:ehr:/347a5490-55ee-4da9-b91a-9bba710f730e/compositions/87284370-2D4B-4e3d-A3F3-F303D2F4F34B/content(openEHR-EHR-SECTION.vital_signs.v1)/items(openEHR-EHR-OBSERVATION.heart_rate-pulse.v1)/data/events(at0006,+'any%20event')/data/items(at0004)

        5. urn:openehr:ehr:compositions/87284370-2D4B-4e3d-A3F3-F303D2F4F34B/content(openEHR-EHR-SECTION.vital_signs.v1)

  10. (Optional) Possibly create DV_REGISTRY_URI class and a registry:-scheme for openEHR system internal use, similar to the ehr-scheme and a corresponding (bidirectional?) translation algortihms between scheme and URN (Perhaps not in 1.1. release unless we find a specific need for it?)

  11. Complete the process of formally registering URN namespace identifier (NID) for openEHR

  12. (Optional) Possibly extend current demographic (future “registry”?) package to include Resources such as rooms, equipment, etc

    1. ( TODO: check if/how this is done in openEHR task planning, also check https://www.hl7.org/fhir/location-definitions.html and descentants and other relations to https://www.hl7.org/fhir/domainresource.html

  13. (Optional) Possibly write an academic paper describing new ways to do semantic web stuff and DL-reasoning with the new identifiers

Specification documents etc to update/change

[DATA_VALUE {bg:white}]^[DV_URI] [DV_URI|+value:String|+scheme();+path();+fragment_id();+query()] [DV_URI]^(?)[DV_OPENEHR_URI|..|*+as_urn()*;+namespace()(?);+system_id() (?);+top_level_structure_locator() (?);+path_inside_top_level_structure() (?); //value starts_with: urn:openehr (or ehr: or maybe registry:) ] [DV_OPENEHR_URI]^[DV_EHR_URI|..|+ehr_id(): HIER_OBJECT_ID (?); //value starts_with: ehr:] [DV_OPENEHR_URI]^[DV_REGISTRY_URI (?) {bg:pink}|..|+party_id(): HIER_OBJECT_ID (?); //value starts_with: registry: ] [LINK|+meaning: DV_TEXT;+type: DV_TEXT;*+target: DV_OPENEHR_URI*]++-1 target>[DV_OPENEHR_URI] [AUDIT_DETAILS]^[ATTESTATION|...;*+items: DV_OPENEHR_URI*]++- items>[DV_OPENEHR_URI] [LOCATABLE]++-* links>[LINK]

Unfinished previous text to possibliy be moved into other sections or deleted:

  1. common RM is too ‘focused’ on EHR uri’s (in ATTESTATION and LOCATABLE)
    ehr://system_id/ehr_id/top_level_structure_locator/path_inside_top_level_structure
    ehr:[[//system_id/]ehr_id/]top_level/path_inside_top_level

    1. ehr:compositions/4ff64a96-7800-4fab-9076-776dc55cb743
      ehr:work-plans/4ff64a96-7800-4fab-9076-776dc55cb743
      ehr:tasks/380daa09-028f-4beb-9803-4aef91644c2a
      ehr:party/380daa09-028f-4beb-9803-4aef91644c2a

    2. ehr:compositions/4ff64a96-7800-4fab-9076-776dc55cb743
      ehr:work-plans/4ff64a96-7800-4fab-9076-776dc55cb743
      tp:tasks/380daa09-028f-4beb-9803-4aef91644c2a
      party:380daa09-028f-4beb-9803-4aef91644c2a

      registry:[//system_id/]top_level_structure_locator/path_inside_top_level_structure

      1. registry:party/4ff64a96-7800-4fab-9076-776dc55cb743

      2. registry:party/87284370-2D4B-4e3d-A3F3-F303D2F4F34B::rmh.nhs.net::2

      3. registry:party/4ff64a96-7800-4fab-9076-776dc55cb743/contacts/addresses/details[openEHR-registry-ADDRESS.electronic_communication.v0]/…

      4. for the future we could also support other registry:resource/4ff64a96-7800-4fab-9076-776dc55cb743 or registry:location/4ff64a96-7800-4fab-9076-776dc55cb743

      5. no support for typeless uris like party:4ff64a96-7800-4fab-9076-776dc55cb743 ???

TODO: Note Matijas comment