So far, we’ve had a few attempts to describe the nature of the cartesian product problem, as it is called by many members of SEC and openEHR community. This page in discourse have useful pointers to these past discussions.

I don’t want to produce yet another attempt at formally describing the problem. Instead, I’m writing this page to clarify why we’ve been unable to provide a formal description of the problem as per SPECQUERY-9 and maybe suggest a way forward so that we can overcome this obstacle.

What’s the problem?

Users of AQL are getting query results with some extra rows, which they do not expect. The users then do one of these two things:

Responding to both of these actions requires that an implementer can offer a well defined model of query and query execution semantics. Ideally, all openEHR implementations should offer the same couple, but that’s not where we are now.

What makes this problem a tricky one to solve is that whether or not the problem exists is subjective. it is entirely possible that the user is getting incorrect results, according to the query semantics the user has in mind, which may not the query semantics the implementer has in mind. Without the existence of a shared and agreed semantics of query and its execution, how can we know if the user is right or wrong?

Therefore, the only description of the so called “cartesian product problem” we can offer is “the user is getting rows in a result set that they did not expect”.

However, it is worth noting that the name of the problem, at least accepted among vendors, hints at the shared perception of its nature: a cartesian product emerging as a result of the query execution. It is not a great stretch of imagination to suggest that we are comfortable with a relational algebra based view of query semantics, at least to some extent, because we’re borrowing a concept from this view to name this problem.

Rather than attempting to define a formal model based on this vague proof of sympathy to relational algebra, I’ll just present it as a strong proof for all of us subconsciously looking for a formal model.

How we can solve the problem?

We define a formal model of query semantics and execution. Such a model should satisfy the following criteria:

TODO: any other suggestions to solve the problem are welcome, if anybody can think of one.

What are the risks?

The following are the risks associated to defining a formal model as suggested above.

Next steps

We would probably like to have a discussion in SEC and attempt to identify the expectations, goals and scope based on input from SEC members. I think if the previous attempts and discussions to solve this problem delivered anything, it is the realisation that we do not seem to have a shared view or vision of AQL. That’s what we need to establish first.