a string matching the regex [a-zA-Z][a-zA-Z0-9_-:/&+?]*
The given regex pattern appears to be wrong, at least its format. IntelliJ (maybe it is the SonarLint plugin) and some other tool (https://www.regextester.com/) are giving me an error when using [a-zA-Z][a-zA-Z0-9_:/&+?]* as pattern. It says the _: part is parsed as a range, while having a wrong range (_ to : is not working because of their order).
Is this meant to mean “… or _ or , or : or …”? Then I would simply propose to change the pattern to: [a-zA-Z][a-zA-Z0-9-_:/&+?]*. This pattern goes through both tools without an error.
That’s true, but the regex does not include the slashes; the slashes are delimiters for pattern matching languages (like sed etc), not part of the regex. The slightly confusing thing is that if you are typing in a regex in an expression like ‘s/old/new/g’ in vi, you need backslashes between the first // pair. But you don’t in pure regex.
Hm, I just tested on regextester, it also wants an escaped forward slash.
I remember putting like 6 in a row for a regex parser in java…
Maybe we should list somewhere which regexp syntax we use? Because all these engines have different rules, including what should be escaped and what should not be escaped.
And java strings indeed complicate this further, with extra '/'-character.
We have in some places specified PCRE, i.e. PERL-compatible regex, i.e. the one that allows \s, \w, \d etc. But the ‘/' thing is just wrong - that’s a delimiter - as we specify here - https://specifications-test.openehr.org/releases/AM/latest/AOM2.html#_c_string_class
But we probably should update that C_STRING documentation as well to be clearer about which regex syntax.
what regex do you use in Archie C_STRING?