Update grammar to support documented openEHR ISO8601 variant using '??'

Description

In the current ODIN spec, a variant of ISO 8601 is documented that uses '??' characters instead of spaces to express partial dates, times and date/times. The grammar should be updated to reflect this.

Activity

Show:
Sebastian Garde
August 29, 2018, 7:37 AM

I agree that this is unavoidable and that the change has been properly executed.

May I suggest to add a comment before or after the formal references to ISO8601_DATE indicating that this a deviation (restriction as well as extension) to the ISO8601 standard, especially given its name. (The total number of deviations to ISO makes me wonder if the name is actually a good choice of name)

Also, in direct relation to this:

In 7.1.6.1 in the ODIN spec, it says: "The Support IM provides a full explanation of the ISO 8601 semantics supported in openEHR." -> Has this been moved to the Foundation types spec?

fragment YEAR : [1-9][0-9]* ;
-> This does not allow for any year before 1000 (except oddities like year e.g. 10), or should it be 1583 (Gregorian calendar) as sometimes/usually assumed by ISO8601. For healthcare this may be irrelevant but if ODIN's ambition is to be universal this may be relevant.
More importantly, this does not formalise the required 4 digits - should this be e.g. [0-9][0-9][0-9][0-9]

fragment MONTH : ( [0][0-9] | [1][0-2] ) ; // month in year
-> Should this be ( [0][1-9] | [1][0-2] ) instead - a 00 month is very confusing since this would be used in some environments to indicate January as well (with months formally ranging from 0..11)

fragment DAY : ( [012][0-9] | [3][0-2] ) ; // day in month
-> This could also be more complex (here 00 is a valid day as well as 32 - not sure if this has a reason)

Diego Bosca
August 29, 2018, 8:14 AM

The real ISO 8601 DateTime regex is really complex, the one we are currently using looks like this

^\d{4}((0[1-9]|1[0-2])((0[1-9]|[12]\d|3[01])(T?([01]\d|2[0-3])([0-5]\d([0-5]\d([,.]\d+)?)?)?(Z|([]((0\d)|(1[0-2]))(00|30)?))?)?)?)?|\d{4}((0[1-9]|1[0-2])(-(0[1-9]|[12]\d|3[01])(T([01]\d|2[0-3])(:[0-5]\d(:[0-5]\d([,.]\d)?)?)?(Z|([+-]((0\d)|(1[0-2]))(00|30))?))?)?)?)?$

"\d" is an alias for [0-9]

This complicates a little more if you also allow the yyyy-MM-dd form. Example of ISO Date supporting this
^\d{4}((((0[1-9])|(1[0-2]))((0[1-9])|([12]\d)|(3[01]))?)?|(-(((0[1-9])|(1[0-2]))(-((0[1-9])|([12]\d)|(3[01])))?)?)?)$

Thomas Beale
August 29, 2018, 8:22 AM

Good catch on the reference. I've fixed that.

For the Regexes:

  • the year one is intended to prevent leading 0s, so years like 352 but not 0352 can be stated. It doesn't try to force 4 digits, but I am inclined to think that higher level specifications (e.g. archetypes with ISO8601 patterns) and tools will enforce that.

  • the month one should be as you state - good catch!

  • I've changed DAY to ( [0][1-9] | [12][0-9] | [3][0-1] )

These changes should now be visible online.

Sebastian Garde
August 29, 2018, 8:46 AM

Thanks. Re the year regex: In my understanding, it is ISO8601 that enforces the (minimum of) 4 digits to avoid the year 2000 problem (hear, hear) and also presumably more generally for 2 digits to avoid confusion for year "99" - Is this 99 or 1999 or 2099, etc.

Accordingly, year 352 is in fact expressed as 0325 in my understanding... otherwise it is another slight deviation from the standard.

This also makes sense when you consider e.g. what related profiles do: https://tools.ietf.org/html/rfc3339#page-4 : "It is possible that a program using two digit years will represent years after 1999 as three digits. This occurs if the program simply subtracts 1900 from the year and doesn't check the number of digits. Programs wishing to robustly deal with dates generated by such broken software may add 1900 to three digit years." -> So, if you follow this recommendation (not saying that it is a good one), then 325 is really 1900+325 = year 2225.

Either way, I very much hope that this is not very relevant for us in 2018!

Thomas Beale
August 29, 2018, 10:31 AM

ISO8601:2004, it has:
~~~~~~~~~~~~~
3.5 Expansion
By mutual agreement of the partners in information interchange, it is permitted to expand the component
identifying the calendar year, which is otherwise limited to four digits. This enables reference to dates and
times in calendar years outside the range supported by complete representations, i.e. before the start of the
year [0000] or after the end of the year [9999].
3.6 Leading zeros
If a time element in a defined representation has a defined length, then leading zeros shall be used as
required
~~~~~~~~~~~~~

So it allows for non-4-digit years between 2 agreeing parties, but generally I think ISO8601 software is going to expect 0-filled 4-digit years. I have now made that further adjustment.

Reporter

Thomas Beale

Raised By

Claude Nanjo

Components

Configure