A-path proposal (Zilics)

Summary

Design

Introduction

The A-path language is the result of an effort to provide syntax and semantics for querying objects of the openEHR reference model, specially archetypes, templates and compositions. A-path is entirely based on the XPath language (query language for XML documents). It also provides basic facilities for manipulation of lists, strings, numbers and booleans.

A-path is part of the zilics-models project, currently hosted in the OpenEhr svn sandbox area (svn url = http://www.openehr.org/svn/ref_impl_java/SANDBOX/zilics-models)

Basic Types

The primary syntactic construct in A-path is the expression (syntactic rule: expr ). An expression syntactically consists of a list of single expressions (syntactic rule: exprSingle ) separated by commas ',' .

Grammatically speakin, we have the following definition of expr :

expr: exprSingle ( ',' exprSingle ) *
;

Each single expression is evaluated to yield a ListValue . A ListValue is a list of SingleValues , and a SingleValue has one of the following five basic types:

RMObject (object of the reference model)
Boolean
Integer
Double precision
String

When an RMObject occurs in a ListValue , then all of its elements are RMObject s too. That is, we cannot have a ListValue containing both anRMObject and a non RMObject .

In the end, all the ListValues generated by each single expression are merged together in another ListValue , which is the value of the expression. For example, the expression '1, 2, 3, 4, 5' consists of 5 single expressions separated by commas. The full expression yields aListValue containing a list of 5 SingleValues of basic type Number, because each single expression yields an atomic ListValue consisting of only one SingleValue . On the other hand, the expression '1, (2, 3)' is an expression composed by 2 single expressions (syntactically), but when it is evaluated, it yields a ListValue containing 3 SingleValues . Another example: the expression '/' yields a ListValue containing a list of oneSingleValue of basic type RMObject (actually it is the root value of the evaluation context).

The context

Every evaluation occurs with respect to a context (called PathEvaluationContext). The PathEvaluationContext consists of:

a SingleValue (the context root, which never changes)
a SingleValue (the context item)
a pair of positive integers (the context position and context size and the context position is always less than or equals to the context size.)
a map of variable bindings
a function library

The variable bindings consists of a mapping from variable names (strings) to variable values (SingleValue ). A variable never assumes a value of the type ListValue and each variable name begins with a dollar symbol '$' . For example, in the expression 'for $n in (1 to 10) return ($n * $n)' , The variable '$n' range between the SingleValue '1' to the SingleValue '10' .

The function library consists of a mapping from function names to functions. Each function takes zero or more arguments (each argument being a ListValue ) and returns a ListValue . For example, in the expression 'myfunc ((1, 2), 4)' , the function 'myfunc' is being invoked with 2 arguments: the first is the ListValue '(1, 2)' and the second is the ListValue '4' (which actually contains only one SingleValue ).

Some expressions

One important kind of expression is a path expression (syntactic rule: pathExpr ). A path expression selects a set of values relative to the context item. The result of evaluating an expression that is a path one is the ListValue containing the SingleValues selected by the path. Path expressions can recursively contain filter expressions (syntactic rule: filterExpr ) that are used to filter ListValues or it can contains basic axes (syntactic rule: axisStep ) for selecting values.

The grammar for pathExpr is:
pathExpr: '/' relativePathExpr
        | '//' relativePathExpr
        | relativePathExpr
        | '/'
        ;

relativePathExpr: stepExpr '/' stepExpr
| stepExpr '//' stepExpr
;

stepExpr: filterExpr
| axisStep
;

The slash '/' operator is called the PathFollow operator. In the expression 'A/B' , where 'A' and 'B' are expressions, first we evaluate the expression 'A' (and it yields a ListValue ). Then for each SingleValue in 'A' , we perform:

Set the context size to the size of 'A'
Set the context position to position of the SingleValue inside the ListValue 'A'
We set the context item to the SingleValue
We evaluate 'B' (and it returns a ListValue )

The final result is the merge of all ListValues of each step. For example '(1 to 3)/(. * .)' , we first evaluate '(1 to 3)' , and for eachSingleValue in this ListValue , we square it and put it in the final ListValue .

The correct precedence of 'A/B/C' is 'A/(B/C)' and not '(A/B)/C' .

The double slashed '//' operator is a contract form of '/descendant-of-self::*/' , which will be explained later.

Basic navigation

The axis step is an expression which operates on RMObject s. To operate on them, first, we define a standard name for each RMObject . For objects of the Archetype Model, it is very easy to define a name, because we can adopt the rmAttributeName of the parent CAttribute as the standard name of an CObject . In the case of the Reference Model, it is possible to adopt the name of the field that contains the object (parent) as the standard name.

Examples of axis step expressions:

'child::data' selects the 'data' element children of the context item
'parent::*' selects the parent element of the context item
'..' contract form of the above
'self::*' selects the current context item
'.' contract form of the above
'descendant::item' selects all 'item' elements descendants of the context item
'metadata::jj' selects the 'jj' metadata of the context item.
'@jj' contract form of the above
'parent::abc' selects the parent element of the context item only if it name is 'abc'
'descendant-or-self::*' selects all the descendants of the context item and the context item itself.

Suppose the current context item is a Composition object and the expression we are evaluating is 'child::content' . This path expression returns the ListValue composed of SingleValues , and each SingleValue holds one element (of type ContentItem) that is on the field 'content' of the Composition .

The grammar for axisStep is:

axisStep: (reverseStep | forwardStep ) predicateList
;

forwardStep: forwardAxis '::' valueTest
| abbrevForwardStep
;

abbrevForwardStep: ('@')? valueTest
;

forwardAxis: 'child'
           | 'descendant'
           | 'self'
           | 'metadata'
           | 'descendant-or-self'
           ;

reverseStep: reverseAxis '::' valueTest
| abbrevReverseStep
;

reverseAxis: 'parent'
           | 'ancestor'
           | 'ancestor-or-self'
           ;

abbrevReverseStep: '..'
;

valueTest: Identifier
| '*'
;

Filtering results

Filter expressions are used to filter elements of ListValues , removing unwanted ones. Filtering is accomplished by using the '[' ']' operator. In the expression 'A[B]' , where 'A' and 'B' are expressions, first we evaluate the expression 'A' (and it yields a ListValue ). Then for eachSingleValue in 'A' , we perform:

Set the context size to the size of 'A'
Set the context position to position of the SingleValue inside the ListValue 'A'
We set the context item to the SingleValue
We evaluate 'B' and if it is true (the meaning of true for ListValues will be explained later), then we add the context item to the final result.

For example '(1 to 5)[. mod 2 = 1]' , we first evaluate '(1 to 5)' , and for each SingleValue in this ListValue , we test it if it is odd.

The correct precedence of 'A[B][C]' is '(A[B])[C]' .

Example of filter expressions:

'value[@node_id = "at0001"]' for each SingleValue in 'child::value' we allow it to pass to the final result if its metadata 'node_id' is equals to "at0001"
'value[at0001]' the contract form of the above

The grammar of filterExpr expressions is:

filterExpr: primaryExpr predicateList
;

predicateList: ( predicate ) *
;

predicate: '[' ATCode ']'
| '[' expr ']'
;

The Boolean value

A ListValue is considered "true" if:

It is not empty and
It is a List of RMObject s or
It contains only one SingleValue and:
- If the basic type is String, then it is true if the string is not empty
- If the basic type is Integer, then it is true if the integer is not zero
- If the basic type is Boolean, then it is true if the boolean is true
- If the basic type is Double, then it is true if the double is not zero

Atomic: An atomic ListValue contains only one SingleValue.

Examples of true ListValues :

'(true)'
'(/, /items[at0001])'
'(2)'

Examples of false ListValues :

'()'
'(2, 3)'
'(false)'
'(0)'

Primary expressions

Primary expressions have basically 5 types: literals, variable references, parenthesized expressions, the context item expression ('.' ) and function calls.

The grammar for primaryExpr is:

primaryExpr: literal
           | varRef
           | parenthesizedExpr
           | contextItemExpr
           | functionCall
           ;

literal: IntegerLiteral
       | DoubleLiteral
       | StringLiteral
       ;

varRef: '$' varName
;

varName: Identifier
;

contextItemExpr: '.'
;

parenthesizedExpr: '(' expr ? ')'
;

functionCall: Identifier '(' ( exprSingle (',' exprSingle )* )? ')'
;
Example of primary expressions:

'$var' - a reference to a variable 'var'
'12' - an integer literal
'"string"' - an string literal
'.' - the context item expression
'fatorial(5)' - a function call

Single expresions

There are several possibilities while building a single expression (syntactic rule: exprSingle ), for example for expressions or if expressions or quantified expressions.

Grammar for exprSingle :

exprSingle: forExpr
          | quantifiedExpr
          | ifExpr
          | orExpr
          ;

For expression

A for expression looks like 'for $var1 in E1, $var2 in E2, ..., $varn in En return E' , where 'E' , 'E1' , 'E2' , ... , 'En' are single expressions and '$var1' , '$var2' , ..., '$varn' are variables.

Grammar for forExpr :

forExpr: 'for' '$' varName 'in' exprSingle (',' '$' varName 'in' exprSingle ) * 'return' exprSingle
;

When we have two or more variables in the for expression, the innermost variable is the one which changes the most. For example in 'for $i in (1 to 3), $j in (1 to 3) return ($i + 10 * $j)' , the result of the previous expression is '(11, 21, 31, 12, 22, 32, 13, 23, 33)' and not{{'(11, 12, 13, 21, 22, 23, 31, 32, 33)'}} .

If expression

An if expression is very simple and it has the form: 'if (C) then ET else EF' , where 'ET' , 'EF' are primary expressions and 'C' is an expression (called condition). The return value of an if expression is the value of 'ET' when 'C' yields a true ListValue and it is the value of{{'EF'}} when 'C' yields a false ListValue .

The grammar for ifExpr is:

ifExpr: 'if' '(' expr ')' 'then' exprSingle 'else' exprSingle
;

Quantified expressions

Also, we have 2 types of quantified expressions: the some and the every expressions. They have similar semantics and they test if a condition holds for some (or every ) variable values.

The syntactic form for quantifiedExpr is:

quantifiedExpr: ('some' | 'every') '$' varName 'in' exprSingle (',' '$' varName 'in' exprSingle ) * 'satisfies' exprSingle
;

For example in the quantified expression 'every $x in (1 to 4) satisfies ($x < 5)' , clearly for every value of '$x' in '(1 to 4)' , it satisfies the condition '($x < 5)' , so the final value of the every expression is true . On the other hand, the quantified expression 'some $x in (1 to 4) satisfies ($x> 5)' clearly yields a false value, because none of the values of '$x' satisfies '($x> 5)' .

Logical operators

It is possible to build up logical expressions with the operators and and or , and that is what is described bellow.

orExpr: andExpr ( 'or' andExpr ) *
;

andExpr: comparisonExpr ( 'and' comparisonExpr ) *
;

Common expressions

A comparison expression (syntactic rule: comparisonExpr ) is used to compare numeric values (integer or doubles), and its syntactic form is comparisonExpr :

comparisonExpr: rangeExpr ( ( '=' | '!=' | '<' | '<=' | '>' | '>=' ) rangeExpr) ?
;
The range operator "to" is used to build lists of numeric values ranging from one value to another, such as in the expression '10 to 100'. The grammar form is:

rangeExpr: additiveExpr ( 'to' additiveExpr ) ?
;
It is possible to perform some basic math operations like '+' , '-' , '*' , 'div' and 'mod' (or remainder ).

additiveExpr: multiplicativeExpr ( ('+' | '-') multiplicativeExpr ) *
;

multiplicativeExpr: unionExpr ( ( '*' | 'mod' | 'div' ) unionExpr) *
;
To merge two or more ListValues , we use the union operator. To intersect the values of two or more ListValues we use the intersect operator.

unionExpr: intersectExceptExpr ( ('union' | '|') intersectExceptExpr ) *
;

intersectExceptExpr: instanceofExpr ( 'intersect' instanceofExpr ) *
;
To check if a given object is an instance of some class, use:

instanceofExpr: unaryExpr ( 'instance' 'of' Identifier ) ?
;

unaryExpr: ('-' | '+') valueExpr
| valueExpr
;

valueExpr: pathExpr
;

Internal functions:

All internal functions are implemented in StandardFunctions

*position()*The current context position of the context item
*last()*The context size
*toXml()*Transform the context value into XML
*define()*Define a function. Examples: 'define(("fatorial", "n"), if ($n> 0) then $n * fatorial($n-1) else 1)'
new()*Instantiate a class that implements *br.com.zilics.archetypes.models.rm.utils.path.model.ObjectValue.
Examples: '(1 to 5)/new("SomeClass", .)' , where:

        public class SomeClass implements ObjectValue {
                private final Integer i;
                public SomeClass(Integer i) {
                        this.i = i;
                }

                public Integer getI() {
                        return this.i;
                }
      }

User defined functions

To define a new function simply implement the interface PathFunction

public interface PathFunction {
public ListValue evaluate(List<TreeNode> nodes, PathEvaluationContext context) throws PathEvaluationException;
{color:#0000cc}}

The evaluate() passes a list of TreeNode s as arguments to the function (that is, arguments could be lazy evaluated because TreeNode represents the abstract syntax tree of the argument). An utility class PathEagerFunction could be used to evaluate the arguments before calling evaluate() of the function itself.

public abstract class PathEagerFunction implements PathFunction {
public abstract ListValue evaluate(PathEvaluationContext context, List<ListValue> params) throws PathEvaluationException;

/**

*

Unknown macro: {@inheritDoc}

         */
        public final ListValue evaluate(List<TreeNode> nodes, PathEvaluationContext context) throws PathEvaluationException {
                List<ListValue> params = new ArrayList<ListValue>();
                for(TreeNode node : nodes) {
                        // Evaluate the arguments
                        params.add(node.evaluate(context));
                }
                return evaluate(context, params);
        }
{color:#0000cc}}

h2. Examples:

Query : "/"

Result : Returns the context root object. For example: if the query is being executed on an composition instance, then it will returns a reference to it.

Query: "/." or "/self::*"

Result: Returns the same result of the preceding query. The "." is the contract form of "self::*" .

Query: "/items" or "/child::items"

Result: Return the list of child objects inside the items attribute of the root object. For example, if this query is performed on an ItemListobject, then it will return all objects inside the items attribute. "items" is the contract form of "child::items" .

Query: "/" ou "/child::"

Result: Returns the list of all child object of all attributes inside the root object.

Query: "/items[at0001]"

Result: Returns the child object inside the items attribute whose node_id is "at0001" . This query is exactly the same as: a: '/items[@node_id = "at0001"]'

Query: "/@node_id" or "/metadata::node_id"

Result: Returns the "node_id" metadata of the root object. "@xxx" is the contract form of "metadata::xxx" .

Query: "/items[at0001]/value"

Result: Returns all child objects inside the value attribute of all instances of "/items[at0001]" . Generally "part1/part2" will execute query{{"part2"}} for each result of "part1" .

Query: "/items/value/.." or "/items/value/parent::*"

Result: Returns the parent object of all instances of "/items/value" . This query is NOT equivalent to the query "/items" , because not every object of "/items" necessarily has a value attribute.

Query: "/descendant::*"

Result: Returns all descendat objects of the root object.

Query: "/descendant::nnn"

Result: Returns all descendat objects of the root object with name "nnn" .

Query: "/descendant-or-self::*" or "//."

Result: Returns all descendant objects of the root object (including itself). "//" is the contract form of "descendant-or-self::*"

Query: "XXX/ancestor::*"

Result: Returns all the ancestors of "XXX" .

Query: "XXX/ancestor-or-self::nnn"

Result: Returns all the ancestors of "XXX" with name "nnn" . (including itself)

Query: "/descendant::*[at0008]"

Result: Returns all the descendant objects of "/" whose "node_id" is "at0008" .

Query: "//@node_id"

Result: Returns the list of all "node_id" metadata of all descendant objects of the root object (incluing itself).

Query: "/items[2]" or "/items[position()=2]"

Result: Returns the second child of the root object inside the items attribute. "items[2]" is the contract form of "[position()=2]"

Query: "/items[position()=last()]"

Result: Returns the last child of the root object inside the items attribute

Query: "/toXml()"

Result: Returns the root object in a XML form (string).

Query: "/items[at0001]/toXml()"

Result: Returns "/items[at0001]" in a XML form;

Query: "/|/items" or "/ union /items"

Result: Returns the union between the results of "/" and "/items" . "|" is the contract form of "union" .

Query: "/ intersect /items"

Result: Returns the intersection between the results of "/" and "/items" .

Query: "/ except /items"

Result: Returns the set difference of the results of "/" and "/items" .

Query: "1"

Result: Returns the Integer "1"

Query: "1+1"

Result: Eval the sum "1+1" = "2"

Query: "2*3"

Result: Eval the product

Query: "10.0 div 3.0"

Result: Eval the quotient

Query: "3>2"

Result: Returns true .

Query: "1=0"

Result: Returns false .

Query: "1=0 or 3>2"

Result: Returns true

Query: "1=0 and 3>2"

Result: Returns false

Query: "(1, 2, 3, 4, 5)"

Result: Returns a list with the integers 1, 2, 3, 4 and 5 in that order.

Query: "(1 to 5)"

Result: Returns the same result as the above query.

Query: "(1 to 5)[. mod 2 = 1]"

Result: Returns all odd integers between 1 and 5 (inclusive). Generally speaking "part1[part2]" will evaluate "part2" to each result of "part1"}}and will let it pass to the final result only if result of {{"part2" was true .

Query: "(1 to 5)/(. * 2)"

Result: Returns the list "2, 4, 6, 8, 10"

Query: "if (2> 1) then (3) else (4)"

Result: Returns "3"

Query: "for $n in (1 to 10) return ($n * $n)"

Result: Returns the list "1, 4, 9, 16, 25, 36, 49, 64, 81, 100" .

Query: "for $x in (1 to 10), $y in (1 to 10) return ($x + $y)"

Result:

Query: "every $x in (1 to 4) satisfies ($x < 5)"

Result: Retorna true

Query: "every $x in (1 to 4) satisfies ($x> 1)"

Result: Returns false

Query: some $x in (1 to 4) satisfies ($x> 1)"

Result: Returns true

Query: "function(1, 2)"

Result: Eval function "function" with parameters (1, 2).

Related Classes/Test Cases

Classes:

br.com.zilics.archetypes.models.rm.utils.path.*;
br.com.zilics.archetypes.models.rm.utils.path.context.*;
br.com.zilics.archetypes.models.rm.utils.path.model.*;
br.com.zilics.archetypes.models.rm.utils.path.parsed.*;
br.com.zilics.archetypes.models.am.utils.path.*;
The grammar br.com.zilics.archetypes.models.rm.utils.path.ArchPath.g

Test cases:

br.com.zilics.archetypes.models.rm.utils.path.*;
br.com.zilics.archetypes.models.rm.utils.path.model.*;
br.com.zilics.archetypes.models.rm.utils.path.parsed.*;
br.com.zilics.archetypes.models.am.utils.path.*;