ITS Versioning and Releasing

This page is for discussing schemes of versioning ITS artefacts.

What is an ITS?

Firstly, 'ITS' (Implementation Technology Specification, a term we re-used from HL7) indicates derivative (but generally hand-built) artefacts based on 'upstream' specifications or 'pure models'. We might reasonably say that UML or BMM are pure, lossless expressions of the specs. ITS artefacts are usually somewhat (or even very) lossy w.r.t. the upstream models/specs, e.g. they may only carry the data view.

Practically speaking, ITS artefacts are the concrete means of using or implementing openEHR in some particular kind of component. Someone who wants to build components using openEHR XML will need canonical XSDs, Schematron or some other kind of XML schemas. Someone who wants to do FHIR-ish kinds of things might want openEHR-flavoured FHIR resources.

The number of ITS artefacts (taking a set of linked files as a single artefact) is roughly No. Specification Components x No. extant releases x No. downstream technologies. Here,

  • 'specfication component' means components as defined by the Specifications project, i.e. what you see here. Not all components have an ITS; those that do / will are RM, AM, BASE, ?TERM, QUERY (indirectly), CDS, SM (see below).
  • 'extant release' means currently used release, i.e. in the sense of different implementer companies / groups being on different releases at any one time - aka early adopters v late adopters.

Are REST APIs ITS artefacts?

When it comes to APIs, my original conception of any API such as REST, SOAP, or a native language API (i.e. Java, C#, ...) should also be seen as an ITS, based on a properly specified and abstractly (but formally) modelled API, expressed in say UML and/or IDL (or Ecore-ish form would also do). In other words, if we build a 'COMPOSITION' REST API with the logical function 'get composition by version' we imply that there is an underlying native service containing a function like get_composition_by_id (version: VERSION): COMPOSITION. In this approach, a SOAP, REST, Java, ... etc APIs should all do the same thing, just in their idiom.

Others here may see each kind of API as a primary artefact, that is fully self-defining semantically, and in some sense this is true right now, since we are building the REST APIs before having abstract APIs specified. I believe that the right approach would be to reverse engineer abstract APIs from concrete APIs, for various reasons - mainly, providing a gold standard of transactional semantics that need to be respected by all kinds of concrete APIs.

If you agree with this, then although we don't have an upstream API spec today, we should consider that REST APIs are still an ITS kind of artefact, and that we will build those abstract definitions later. If you don't agree, then you may argue that the REST APIs are not ITS artefacts at all.

Versioning ITS Artefacts

Problem Statement

Organise the ITS repo(s) in such a way that :

  • ITS artefacts are easily findable by release id of the relevant upstream artefact
    • i.e. I want to easily find the RM Release-1.0.2 XSDs, the AM Release 2.0.6 XSDs etc
  • ITSs for all (or say, last x years) of releases of each upstream component are available in parallel.
    • i.e. company A is using the RM Release-1.0.2 XSDs and AM Release 1.4.2 XSDs, company B uses RM Release-1.1.0 XSDs and AM Release 2.0.6 XSDs
  • Any given ITS artefact can have patch level fixes done to it, to correct errors that relate to its formalism (e.g. XSD wrong namespaces or whatever), and that the latest patch version of any ITS artefact for any upstream release of a component is easily findable.

We had various conversations in the past about this, and the various proposals, in brief are:

Textbook versioning - linear

This is one approach we discussed at last year's SEC meeting (wiki notes here), where each component (RM etc) has its own ITS repo, and successive releases (1.0.0, 1.0.1, 1.0.2, 1.0.3 etc) are on a single line of development and essentially replace each other.

pros:

  • easy-ish

cons:

  • doesn't reflect the real-world use of ITS artefacts - in fact, ITS artefacts of multiple component releases are in use at any one time
  • doesn't allow for patch level fixes to ITS artefacts in a clean way.

Textbook versioning - branching

This is the usual approach applied to primary artefacts, particularly software, that are continually maintained in multiple extant versions. Concretely here it would involve a repo per upstream component (RM, AM, SM, etc) and the use of formal branches inside each repo.  Pablo also provided a diag here.

Repos:

  • RM-ITS
    • branch 1.0.1 (RM release id)
      • XSD
        • some xsd. files
    • branch 1.0.2
      • XSD
        • some xsd files
    • branch 1.1.0
      • XSD
        • some xsd files
      • Schematron
        • some schematron files
  • AM-ITS
    • branch 1.4.2
      • etc
  • SM-ITS
    • branch ??
      • REST APIs
        • some apib files
      • SOAP APIs
        • some wsdl files
    • branch ??
      • REST APIs
        • some apib files
      • SOAP APIs
        • some wsdl files
    • etc

pros:

  • it's the standard approach so people understand it
  • earlier or later patch versions of each ITS artefact in its upstream release are visible on a dedicated line of development (a branch)

cons:

  • it's harder to get at the artefacts - they are buried in different branches in different repos, and a lot of checkouts are required to obtain them. Diffing across say RM 1.0.2 and 1.1.0 XSDs is reasonably annoying, although some Git UI tools make it a bit easier.
  • the arrangement of REST and other APIs is probably open to question

Single repo

The is the approach some of us I believe arrived at in later conversations last year. It is a pragmatic approach, and results in a repo that uses a directory structure based on upstream release ids as the structure. One such structure is:

  • Spec-ITS
    • RM
      • Release-1.0.2
        • XSD
          • some XSD files
        • schematron
          • etc
        • other
          • etc
      • Release-1.0.3
        • XSD
          • some XSD files
        • schematron
          • etc
        • other
          • etc
    • AM
      • Release-1.4.2
        • XSD
        • schematron
        • other
      • Release-2.0.6
        • etc
    • SM
      • Release-???
        • REST
        • SOAP
        • other
      • etc
    • etc

Other structures are possible, e.g. factored by ITS technology rather than upstream component:

  • Spec-ITS
    • XSD
      • RM
        • 1.0.2
          • some xsd files
        • 1.0.3
          • some xsd files
      • AM
        • 1.4.2
          • some xsd files
    • WSDL
      • RM
      • etc
    • REST
      • SM?
        • ???

Sebastian's preferred structure (9 July 2018):

Sebastian Iancu: one thing we discussed about was that actually each technology (XSD, etc) will have to be released/implemented for each release of the underlying artefact (RM/SM/etc), and all have to be maintained in parallel for a while (so XSD 1.0.2, 1.0.3, 1.0.4 should be maintained for a while). Besides that, we might have typos or implementation specific issues that we might want to correct - these are not RM/SM/etc related. My Impression is that overall we'll be better with one single big repo (and is still manageable) - but if I'm somehow wrong, I think we can always split it later in several small ones.

To avoid confusions, we could release the upnext as the v2.0.0 of ITS, since the whole structure is going to be changed. Or even more, we can name it 2018.8 (so yyyy.m) to be sure nobody will confuse it.

The particular structure probably doesn't matter that much, since either way, it's an everything-at-once approach.

pros:

  • In this approach, a single checkout gets you everything in all extant upstream releases, in the latest patch version of each concrete ITS artefact. It's easy to see what's there, compare, diff, etc.
  • a patch level fix to anything, e.g. the RM/1.0.2/XSDs is just a normal version
  • tags on this repo correspond to a system wide release idea, e.g. "2017-03", 2017-06".

cons:

  • if we think that we want patch level fixes to have individual lines of development, rather than being mixed together, this approach will be less convenient.

Multiple Repos, one per technology (TB)

Separate repos for

  • ITS-REST
  • ITS-XML
  • ITS-other

If we think that these technology expressions are relatively independent of each other, it could make sense. Also, then REST Release-1.0.0 is the release of the whole repo, and we don't get the confusion of ITS 1.1.0 = REST APIs 1.0.0.

pros:

  • 1 repo = 1 technology = 1 release history, whereas the single giant repo contains N release histories mixed together.

Conclusions

Sebastian Iancu:

I generally agree with principle 1repo=1tech, it will indeed make life easier to maintain it, to own it.
But I'm still not sure if it is the best answer here. We should not forget that history/releasing 1tech is not always equal (in sync) with RM release. As an exercise, lets look to the current state:

  • we have RM 1.0.3 which is reflected in current REST 0.9.2
  • we might change/fix/correct things in REST and makes it eventually 1.0.0
  • we'll have a RM 1.0.4 which might or might not (unlikely) be compatible with REST 1.0.0
  • we might find new issues and we'll release REST 1.0.1 - based (compatible) on RM 1.0.3 and 1.0.4
  • we'll have RM 1.0.0 which might require additions (REST 1.1.0) or even breaking changes in REST (so that's 2.0.0)
    • REST 1.1.0 or 2.0.0 does not make REST 1.0.1 obsolete - they have to be maintained for a while in parallel (branches) because they are based on two different RM version
    • not related to any new RM release, we might want to correct things or add new endpoints, etc - at that moment we'll have to argue if the change should be done in both release branches or only in the latest
    • ... so we may end up having to release a REST 1.0.2 (still based on RM 1.0.3) - but what if we want a new endpoint? REST 1.1.0 might be already taken for RM 1.0.4.

My point is that any new release in RM / AQL / AOM / etc, might require a new release on REST, but we might need to have also new releases that are not "triggered" by RM / AQL / AOM. And we might need maintain these releases as (parallel) branches, as not all implementations is using latest RM / AQL / AOM / etc.

Another point is that REST requires data models to be described by a schema (JSONSchema or XSD) - so yes they are different techs (in two or more different repos) but sometimes they might need to be released together.

How are we going to manage all these? We can have a big giant repo where we have to structure things in a smart way or ...if we go for different repos, don't we need then something like a dependency map? (something like the npm / composer *.json file)