According to the Archetype Identification specification, the format for the concept of an ARCHETYPE_HRID is:
concept_id = V_SEGMENTED_ALPHANUMERIC_NAME ;
V_SEGMENTED_ALPHANUMERIC_NAME = ? [a-zA-Z][a-zA-Z0-9_-]+ ? ; (* allows hyphens *)
Which allows a - and _ character as the last character of the concept id. According to the ADL 2 specification, the concept ID is a LABEL, which must be:
fragment LABEL : ALPHA_CHAR ( NAME_CHAR* ALPHANUM_CHAR )? ;
which cannot end with a '-' or '_'.
These are two conflicting definitions. Which one is the correct definition?
Also, in the ADL 2 grammar, LABEL is used both in the lexer for the URL and the ARCHETYPE_HRID. That is a problem, especially since the URL parsing grammar has some issues - this LABEL should include escape characters, for example.
I would suggest splitting that in two separate fragments, one for URL parsing and one for ARCHETYPE_HRID, as done in https://github.com/openEHR/archie/blob/master/grammars/src/main/antlr/BaseLexer.g4#L83
in which also the URL parsing is changed to better correspond to the URL RFC format.
I think what we should do is:
move the ARCHETYPE_HRID spec to the Base Types spec, Identification Section.
define the grammar of the id once, there. Probably, although trailing underscore and hyphen don’t make much sense, we should not actually prevent them?
we update the formal Antlr grammar and lex spec for AOM2 to be a cleaned up copy of what is currently in Archie
we then include a link to the ARCHETYPE_HRID bit of that from where ARCHETYPE_HRID is specified in the Base types.