ADL Grammar supports only one string in combination with Regexpr

Description

Grammar supports this:
string_attr2 matches {/this|that|something else/}
and this
string_attr2a matches {/this|that|something else/; "and"}

(both OK)

but it should also support this:
string_attr2a matches {/this|that|something else/; "and", "something", "else"}

according to
http://www.openehr.org/releases/AM/latest/docs/AOM2/AOM2.html#_c_string_class

Please take a position soon, so the grammar can be conform the specs

Thanks
Bert

Environment

None

Activity

Show:
Bert Verhees
March 16, 2016, 10:36 AM

Hi Thomas, I checked the grammars on github, they have not been changed for a few months
https://github.com/openEHR/adl-antlr/commits/master/src/main/antlr

I must have missed the mentioned change quite some time ago. Stupid me.
Seems there is some work waiting for me.

Have a nice day, also for Diego, and sorry for the trouble.

Bert

Bert Verhees
March 16, 2016, 6:16 PM
Edited

There is only one small problem left.

According to the grammar there is only one regexpression possible, and that is OK, in my opinion, because a regexpression can contain a list in itself, if a list is wanted. But in the specs
http://www.openehr.org/releases/AM/latest/docs/AOM2/AOM2.html#_c_string_class
is said that the constraint-list can contain a list of regepxressions.

I think the constraint can be a list of strings, or a single reg-expression, but not a list of regexpressions.
Also a regexpression can be delimited by ^ (according the grammar) the specs only says / as delimiter

Thomas Beale
March 16, 2016, 10:06 PM

Bert you are right that there is an ambiguity re: whether we allow multiple strings versus a single one, since if it is a regex, you can always just do {/this|that|other/} to get the same effect as {"this", "that", "other"}. For now, I don't know what the right answer is, so I'll just put a TBD in the spec at the point at at least we remember it needs to be resolved.

Bert Verhees
April 7, 2016, 8:12 AM

This is the grammar:
cString : ( stringValue | stringListValue | regexConstraint ) assumedStringValue? ;
stringListValue : stringValue ( ( ',' stringValue )+ | ',' SYM_LIST_CONTINUE ) ;

We have three possible constraints in the grammar, but only two recognizable in the processing code (single string and stringlist).

Constraints can be
1) "this" -> string value
2) "this",... -> stringlist with one element
3) "this","that" -> string list
4) "/this/" -> regexpression
5) "/cardio.*/" -> regexpression
5) "cardio.*" -> string value

Solution below comes first to mind, but first a remark:
In fact the CString class is a weak class, because it does not make clear if a single-string constraint is a regular expression or a string. The software using the AOM must do processing to distinguish both possibilities, I don't think that is desirable. Software should enclose needed functionality in itself, and should not expect tricky knowledge at client software. On solution would be adding a property to CString to contain a regular expression, but I guess that is for other reasons not desirable (conformance to structure of other primitives), and it is also not necessary.

But, now the most obvious solution:
Constraint 1, 4 and 5 are represented by a single string and constraint 2 and 3 by a stringlist.
Therefore it is impossible for processing code to distinguish a single string (1) from a reg-expression (4 and 5) because both are in the same property called constraint as a single string.
It is no problem if 1 is regarded as a regular expression because the effect is the same. "this" as regular expression does the same as "this" as constraint string. But 6 is a problem, it is a difference when processed as regular expression or single string

My idea is, regard a single string always as regular expression
(we also have a list of strings as a list constraint, which can also be one string by using the SYM_LIST_CONTINUE)

In this way we have maximum flexibility
So, 1, 4, 5 and 6 will always be processed as regular expression.
And also the recognizability of the properties of CString leave us no choice then to do so.

This makes the the slashes enclosing the regular expressions superfluous because a single string is always a regular expression.
Then we must remove also the escaping the forward-slash "/" inside a single string.

I hope you will come to a soon decision about this, because, it is quite necessary to have this stable solved.

Thomas Beale
April 7, 2016, 9:03 AM
Edited

I also thought about a single regex as the form of the constraint. The downside is that a) when creating a constraint, e.g. in an editor, you have to create a regex string, and if your literal string contains regex special characters, they have to be quoted and b) when validating data against the constraint at runtime, you must at least have a regex library.

The problem with a) is that say your string is "a[124]" for some strange reason then you can't just make the constraint "/a[124]/", but instead it must be "/a[124]/". And so on.

But it is still a good question. I will try to get wide input on this.

Reporter

Bert Verhees

Labels

None

Components

Priority

Major
Configure