ADL 2 ARCHETYPE_HRID syntax: conflicting definitions

Description

According to the Archetype Identification specification, the format for the concept of an ARCHETYPE_HRID is:

concept_id = V_SEGMENTED_ALPHANUMERIC_NAME ;
V_SEGMENTED_ALPHANUMERIC_NAME = ? [a-zA-Z][a-zA-Z0-9_-]+ ? ; (* allows hyphens *)

https://specifications.openehr.org/releases/AM/latest/Identification.html#_human_readable_identifier_hrid

Which allows a - and _ character as the last character of the concept id. According to the ADL 2 specification, the concept ID is a LABEL, which must be:

fragment LABEL : ALPHA_CHAR ( NAME_CHAR* ALPHANUM_CHAR )? ;

which cannot end with a '-' or '_'.

These are two conflicting definitions. Which one is the correct definition?

Also, in the ADL 2 grammar, LABEL is used both in the lexer for the URL and the ARCHETYPE_HRID. That is a problem, especially since the URL parsing grammar has some issues - this LABEL should include escape characters, for example.
I would suggest splitting that in two separate fragments, one for URL parsing and one for ARCHETYPE_HRID, as done in https://github.com/openEHR/archie/blob/master/grammars/src/main/antlr/BaseLexer.g4#L83
in which also the URL parsing is changed to better correspond to the URL RFC format.

Environment

None

Activity

Show:
Thomas Beale
August 19, 2020, 9:58 AM

I think what we should do is:

  • move the ARCHETYPE_HRID spec to the Base Types spec, Identification Section.

  • define the grammar of the id once, there. Probably, although trailing underscore and hyphen don’t make much sense, we should not actually prevent them?

  • we update the formal Antlr grammar and lex spec for AOM2 to be a cleaned up copy of what is currently in Archie

  • we then include a link to the ARCHETYPE_HRID bit of that from where ARCHETYPE_HRID is specified in the Base types.

Reporter

Pieter Bos

Labels

None

Priority

Low
Configure