Better Packaging draft spec

NOTE: This document is a years-old draft and was made internally for use within Better, so it might be mentioning non-standard terms, proprietary use-cases and even some Better teams or people.

About packages

We will call the files that hold artefacts for the EHR Server "artefact content packages". (Can this be called "content", though?)

Each package has a string (human-readable) ID, a version, a description, specification of dependencies on other packages and the engine field with version specs of ehrServer, formRenderer, terminologyServer (absence means "any version will do"). This is all a part of a standard NPM package.json. In addition to that, packages declare provided and required content, i.e. template ID's and their semantic version, terminology IDs and their semantic version, event IDs, view names etc. When building the package, the list of provided content should be automatically generated by the packaging tool. When validating a set of content packages for installation, the governance tool should check whether all the required content is provided by other packages. This way, some content meta-package can declare that it requires a certain terminology, but not depend on a specific package to provide it to allow for client customisability. On top of that, required variables (see section about Events) should also be listed in package.json, possibly assembled by the packaging tool.

Multiple versions of the package with the same ID can be installed simultaneously.

Package installation to EHR Server means uploading of all templates, forms, views and events that it contains to the EHR Server. They will all be tagged with the name and version of the package that installed them, like "source:(packageId):(packageVersion)". (Manually uploaded artefacts should get tagged by "source:manual-upload".)

Package uninstallation means removing all of those from the EHR Server, except:

  • artefacts that were installed by another still installed package or manually,

  • templates for which compositions exist,

  • forms which are referenced in form_id/form_version tags on compositions. (Although customers can use different tag names, we will assume and support only those.) Important, those templates and forms need to be logically removed to prohibit new contributions through these templates/forms. So we need a concept of logically deleted form. (We have it for templates already.)

Package (un)installation to other parts of the platform (terminology server, studio) TBD.

TBD: The governance tool already assured that the packages being installed have no conflicts, however, to avoid conflicts at point of installation, I see two solutions:

  • I prefer the one where, during (after) installation of a bundle, all other packages are uninstalled automatically. This means that the bundle should contain everything a particular setting needs. This has a huge advantage of the governance tool actually having oversight of the entire installation, not just a partial view of the content that is currently in the operations focus. Note that some artefacts might not be removable because they are in use, i.e. a template with a particular semantic version might be used for some compositions, and now with a bundle a new template with the same version but different content (should not happen, but still!) is being installed. In this case we should keep the old variant, but hidden, and still install the new version, just like we do now with new templates with the same ID.

  • The other option is to keep existing artefacts and implement and somehow react to conflicts on the EHR Server (and others). I see this as problematic because the customers would begin taking shortcuts, like taking an EHR Server installation with some packages installed, and then adding a bundle of packages only containing some content they are interested in right now, that might conflict with what is already installed, and resolution at this point will be impossible.

Artefacts

Packages can contain templates, forms, views, events and terminologies. Studio artefacts TBD

All 4 types of artefacts, deployable to EHR Server, should be taggable in runtime on the server, and have a notion of 'internal' tags like templates have. Then, like mentioned previously, we should add tags like 'package:id:version' to tag the source(s) of stuff, and also have a 'source:manual-upload' tag so that we leave those artefacts there even when removing the last package, in case the artefact was manually uploaded/modified as well. Currently, only templates support internal tags, and forms support tags (but not internal). Tagging support for events and views does not seem generally useful, so we might implement only the notion of internal tags (which means external API is not aware of them and needs not be changed).

Templates

OPT files in directory templates. Each OPT can be accompanied by a ZIP fileset with same name (apart from the extension).

Only templates with semantic versioning in the template_id (i.e. a suffix of "-N-X-Y") template/details/other_details[id="sem_ver"] are supported in packages. When exporting a template to a package, a fileset should be used whenever available. It is not mandatory to use a fileset in a package, though.

Multiple versions of the same template ID need to be stored, displayed on the administration console and over the REST APIs, so an analysis needs to be done on this and estimate the implementation effort.

The governance tool rejects combination of packages that contain templates with equal ID+semver and different contents. (This should never happen, of course.) Manual upload to EHR Server would overwrite the template, but should be prohibited on production servers.

On composition upload, the template ID is always resolved to the template instance with the greatest semver. However, a semver resolution also needs to be implemented so that the client can pass a template ID with a version specifier suffix after a colon, like "My template:>=5.0.6"

ADL2 also has semver, so this way we'll be able to retain this logic if we upgrade.

Forms

ZIP files in directory forms.

Those are already versioned and semver-resolved, so not much work to be done on the EHR server.

Governance tool rejects package combination if there is a conflict (same ID+version, different actual form).

Views

JSON files in directory views.

Views are not versioned. Governance tool rejects package combinations with conflicting views (same name, different specification). To avoid conflicts between different versions of the same package, developers are encouraged to enrich view name with a semantic version.

Events

JSON file in directory events.

Events are different because we do not want multiple versions of those installed due to unwanted duplication of external effects.

The package must not contain an event if it contains anything else (including other events: only one event per package is supported). Event name must be equal to package ID (excluding version).

Governance tool must be aware that for those packages, only a single version can be installed when resolving dependencies.

Push event "destinations" and queue event "topic names" need to be able to resolve some variables, e.g. http://$(the.lab.system.host)/event_handler (with curly brackets, but Jira is stupid and doesn't let me use them here) for push event destination. The required variables should be listed in the package.json. Actual values for those variables should be given alongside the full package list to the governance tool, and the installation bundle should only be prepared if all the required variables have a value set.

Event packages are prohibited from having any dependencies.

Terminologies

CSV files in directory terminologies.

Terminologies are not versioned and are packaged similarly to views. They can be in the same package as other content, but usually they would be packaged separately and not even directly dependent on, but rather referenced via a content requirement. Governance tool rejects combinations of packages that contain a terminology with equal name but different contents.

Operational Procedures

TBD What follows is an initial draft, but this needs to be rethought in cooperation with the Studio team.

  • Administrators can still manually load whatever onto the EHR Server. (We might make an option to prohibit that on production servers.) We should also support fileset uploads when uploading a template. We should store the entire fileset, but extract and use only the template.

  • Then, using some tool (probably CLI, with an option to do it graphically on the admin console as well), the admin will be able to export stuff from EHR Server to packages: either a single event, or a collection of templates, forms and views (specific versions of all of those); the user will have to declare dependencies at this point as well.

  • User will then upload those packages to a NPM repository.

  • User will prepare a list of required packages for some setup (including possibly a list of "negative" dependencies to prohibit certain versions of packages if required), input it into a governance tool alongside with a value assignment for the variables, needed by those packages. The governance tool will then resolve those dependencies and, if there will be no errors and the warnings will be dismissed, prepare bundle file containing all required package versions (including dependencies) and the variable assignment.

  • This bundle will be uploaded to EHR Server and other servers that will each install the artefacts that concern them, without much validation because that will already have been done by the governance tool.

Dependency Resolution Algorithm

This algorithm resides within the governance tool.

Input: a list of requested package IDs with version specifiers (just like a dependency definition), a list of prohibited package/version combinations, a map with variable assignment (to string values)

Output: a list of concrete package ID/version pairs, and a list of external variables that are required by this set of packages.

# Initialisation 0. packagesToInstall = [], effectiveEventPackageDependencies={} 1. dependencyQueue.addAll(initialDependencies); # Normal package dependency resolution 2. for each dependency declaration D in dependencyQueue: 1. if D is not an event package: 1. in the repository, find the maximal package version V satisfying D (obeying prohibited versions); fail if not found 2. if V not in packagesToInstall, add it, and also to the dependencyQueue else: 1. if D.name in effectiveEventPackageDependencies: 1. replace effectiveEventPackageDependencies[D.name] with an intersection of existing element and D; fail if empty else: 1. effectiveEventPackageDependencies[D.name] = D # Event package dependency resolution 3. for each event package dependency E in effectiveEventPackageDependencies: 1. in the repository, find the maximal package version V satisfying E (obeying prohibited versions); fail if not found 2. add in into packagesToInstall # Required content validation 4. provided_content = set of all provided content identifiers (template ID+semvers, form ID+semvers, view names, event names, terminology IDs, ...) 5. missing_content = multimap of entries (package, contentId) for each contentId in package.required_content that is absent from provided_content 6. if missing_content is not empty, fail with an error, describing what is missing # Required variables validation 7. missing_variables = multimap of entries (variable, package) for each variable in package.required_variables that is absent from the variable assignment map 8. if missing_variables is not empty, fail with an error, describing what is missing

TODOs

  • If OPTs would be in filesets, what would the language be? Can we support multi-language templates so that we choose the correct OPT given the input language?

Package format

The only sensible one I found is NPM, already proposed by Boštjan. Installation and dependency resolution would have to be implemented by us anyway (and we would not respect the latest dependency resolution ways of NPM which only choose one version of each package, like maven does for Java), but it's nice to have a common format because you need less documentation and you can simply use existing package repositories, from trivial ones like verdaccio, to the complex ones we already use like nexus.

Related content