can you raise an issue on SPECPR - that's the issue tracker that we use to feed specification work. If you just paste most of this post in as the description that will be enough to get back to this when more people can get involved (which will be fairly soon).
On 26/09/2013 11:21, Bert Verhees wrote:
>>> In my system it is not useful to preload archetypes, because, archetypes are only parsed once in my system.
>>> That is when they are saved in the system. They are parsed in order to create a RNG/Schematron definition.
>> ok, so the downstream form of an archetype you are using is a Schematron schema - so that's the thing that needs to be stored.
> OK, I misunderstood that part of the discussion, having a form of XML-schema is a representation of an archetype, which can be for specific purposes like validation more efficient then the archetype-object, depending on the technical architecture of the kernel.
> It seems that we agree on that.
>>> That is used to validate the data, and if new data are entered, then they will be checked against that RNG/Schematron definition, not against the parsed archetype.
>>> The schema is loaded in microseconds and the validation takes one second.
>>> After the data are validated, they are stored in an XML-database, and they will never be validated again. They are ready for XPath-queries and XQueries, and all kind of complicated handling without even looking at an archetype.
>> right - that sounds like all other archetype-based systems I know of.
>>> So the refusal to specify a "archetype_id" in the specs is, in my architecture, bad for performance, because it forces extra archetype-parsing, so I have that property without the consensus with the specs, and I do not see it as a waste. I make sure that when I have to export data to an OpenEHR system, I will put the archetype_id in the archetype_node_id property.
>> but the specs already specify archetype_details, which contains the archetype id. And you can detect that easily in a schematron schema I guess. So you can easily figure out that you are on one of those nodes. Is the real problem simply that the syntax of what is in archetype_node_id on one of those nodes - an archetype_id rather than an at-code - causes some problem in your processing? I am not clear on what though... are you trying to use the at-code texts at runtime? Are they also in the Schematron schema?
> We are not talking about the OpenEHR reference model, but about archetyped data-handling.
> I have two arguments, the first one is most simple to explain, so I start with that.
> ---------------------- > 1)
> A golden rule in design is that attribute-names should indicate what they are there for.
> We are not writing obfuscated code, but readable code, because the cold war is finished, and we do not need to confuse the Russians anymore, so we can safely honor this rule.
> This means, an attribute (in the ADL common notation) which contains the archetypeNodeID should be called archetype_node_id and an attribute containing an archetypeId should be called archetype_id and it is confusing to use the attribute archetype_node_id to store both, and even, which makes it worse, without indication about what is in it.
> ---------------------- > 2)
> The second argument is a more technical issue and a bit difficult to explain, but I try with an example:
> Imagine you have extracted an XML-path in your datastorage which says
> Say, your client software wants to build a GUI, and uses the ontology-information to create the GUI-control-indicators and help-information. I think this is possible to do that that way. It makes dynamic GUI-building possible.
> This example-path above is easy to find and will not cause any complicated handling.
> But in the current situation, the path can look like:
> First Step: Now the GUI software wants to have a container-control which contains the items, and it looks in the ontology of the containing data-set-archetype to find the archetype_node_id: "openEHR-EHR-ITEM_LIST.address.v1"
> It does not find it, because it is not there.
> Second Step: Now you suggest that the software should look if there is an archetypeDetails attribute, to see if there is another archetype to be used for ontology search. This is one step extra the software needs to do.
> Should it do this at every archetypeNodeId, or only if search did not give a result? That is a statistical question, which workaround will be applied more and cost more on the long term. Maybe some tricks may help, and we get tricky software.
> Third Step: Then, the archetype_node_id in that archetype to search for is invisible for the software, because, it is not in the path. So, this step is a more complicated, the software needs to know which archetype_node_id belongs to the root of that archetype, and then it can find in the ontology section what the description is.
> This all could be so much easier, and efficient when the extracted path looked like:
> ....../details[@archetype_node_id="at0001"]/items[@archetype_id="openEHR-EHR-ITEM_LIST.address.v1" @archetype_node_id="at0000"]/.........
> The software would know in one step what to do to build its dynamic GUI. It would see in one step that there is another archetype/ontology-section involved, and it would know in the same step which archetypeNodeId to look for.
> It seems to me that the golden rule in my first argument is there for good reason. It makes code not only better readable, but also more efficient, it forces short code-paths to solutions for information-handling
> ---------------------- >
> I hope my arguments are clear now.
> openEHR-technical mailing list
Ocean Informatics Thomas Beale
Chief Technology Officer
+44 7792 403 613 Specification Program, openEHR
Honorary Research Fellow, UCL
Chartered IT Professional Fellow, BCS
Health IT blog View Thomas Beale's profile on LinkedIn
openEHR-technical mailing list
Size not only matters on Tb/Pb of records, but for app servers pulling back data to screens of e.g. 1,500 workers in a hospital. The original design of the RM tried to be fairly parsimonious in terms of space while retaining strong typing. So that does of course lead to some compromises.
Above Bert said:
"To store an archetypeId in the attribute archetype_node_id, because the attribute archetype_id does not exist is wrong. Here is refactoring necessary. "
But in fact, the archetype ID is the archetype node id on the root nodes.
To be discussed!
Heath, it is very hard to use the archetyped-class as it is in the locatable in archetype_details in a path-based query where you want the way to a leaf-node because the archetype_details is always a side-path. Side-paths in path-based queries should always be avoided, they slow down a path based query system because they have to do an extra query.
Efficiency should indeed be a goal, and having the archetype_id as an attribute makes querying much more efficient.
Anyway, as it is now, where the original question comes from, storing an archetype_id in an archetype_node_id is wrong for two reasons
against all rules of having meaningful names,
it makes it impossible to have an archetype_id and an archetype_node_id in a path-notation.
Thomas, you write: "But in fact, the archetype ID is the archetype node id on the root nodes."
This is not true in archetypeslots where as well an archetype_node_id as an archetype_id can contain valid, but different information.
I have also another reason to support the idea of having an archetype_id as attribute in the XSD, because, that is where the discussion boils to. It is not about the RM, not about the AOM, but about in which way the archetype_id is allowed to be part of a XML-path.
The AOM has node_id in C_OBJECT, this is for having the AOM the possibility of representing all information necessary to define a data-instances over archetypes.
The node_id, as the SPECS say, must be represented in the ontology-section, this is never the case for an archetype_id which, as it is now, also is stored in the node_id, so that is wrong use of the AOM.
This, IMHO, clearly indicates that archetype_id should not be stored in node_id. !!!
In fact, there is also an attribute archetype_node_id in LOCATABLE, the node_id in C_OBJECT (in OpenEHR context) is to contain the archetype_node_id in LOCATABLE. In the XSD is the archetype_node_id defined as an attribute to the LOCATABLE.
So, IMHO we must agree that the AOM spec implicitly indicates that the node_id (archetype_node_id in the XSD) may not contain an archetype_id. This is to respond to the statement of Thomas, a few comments back.
We can say that the XSD has no room to store an archetype_id as attribute. And using the archetype_details as information-source in queries is inefficient.
So adding an archetype_id to the XSD, and in effect, to the LOCATABLE, and maybe also to the AOM solves problems and avoids misuse of the specs. The archetype_details (ARCHETYPED) property can be used to give additional information about the reference model, as RM version. And it contains the archetype_id as an ARCHETYPE_ID, which can be queried as an identifier with special features.
I admit the problem is more complex then it seems in first, but that should not keep us from solving it.
IMHO it would be good to have two separate attributes for node_id and attribute_id. The problem I faced is about terminology lookup:
if archetype_node_id is an at code, I use that to search the ontology/terminology for the term
else, I need to go to the archetype/opt, get the "concept" code, then use the concept code to get the term.
Having the node_id for the root nodes in data instances helps to avoid the archetype/opt lookup. So my argument is more about simplifying things. But also I agree that using the same "thing" for two purposes is not ideal.