PBS XML Design Principles

Design Principles of the PBS XML Document.

Structure

The three (3) physical structures in the XML document are:

Element Structure

The Hierarchy of Elements is the backbone of the PBS XML documents.

Context – to identify Parent/Child relationships.
Purpose – to navigate up and down hierarchy to get the information required.

Linking Structure

The PBS XML document does not repeat data, instead the document relies on the use of linkages to retain normalisation.

Purpose – to prevent repeating the same data throughout the PBS XML document.
Context – links are one directional from the source to the target. Therefore xlink:href is always paired with xml:id.

Ontology Structure

An ontology is possible only when there is identification of terminologies and assigned meanings to the terminologies.

Purpose – to identify a set of terms and their meanings.
Context – rdf:resource identifies the term and is always paired with rdf:about to identify the meaning of the term.

Element Groups

The element hierarchy is the backbone of the PBS XML file, due to XML being a hierarchical data format. Parent-child relationships are present throughout the data file, and these relationships are both important and provide context for each data element.

To avoid duplication or replicated data at different levels, the context of each XML element is relied upon to provide the information required. For example, within a Prescribing Rule, the Program Code of that Prescribing Rule is contained by the parent element of Prescribing Rule - the program element. Structuring the data this way allows for the file size to be minimised by not requiring the Program Code to be repeated for each Prescribing Rule.

This flexibility requires traversing of the hierarchical parent-child relationships to gather all necessary data.

Linking

The PBS XML is comprehensive and therefore complex. To minimise the file size of the file, elements within the PBS XML version 3 data file are normalised, which enable linkages from one element to another normalised element elsewhere in the document. The use of XLinks provides this functionality. All XLinks within the PBS XML version 3 data file provide the uniform resource locator (URL) of the unique element by specifying the xlink:href attribute. The xml:id attribute provides the unique identifier for the target element within the document.

Ontology

Built on the RDF, ontologies are a formal mechanism to describe networks consisting of taxonomies and classifications. An ontology defines the structure of knowledge for the domain, for the purposes of this document, the ontology defines the structure of the PBS data.

An ontology resembles a class hierarchy in object oriented software development practices, with nouns representing classes of objects and the verbs representing the relationships between the objects. Unlike class diagrams however, ontologies remain very flexible as they represent information which is diverse and potentially fluid.

Design Goals

The following are goals of the PBS XML Schema:

Complete and accurate representation of the Pharmaceutical Benefits Scheme Schedule of Benefits.
As much as possible use Australian Government, Web and international standards.

Support use of complementary standards, such as AMT and ATC.
Minimise the interpretation required for processing data.
Support incremental and absolute processing models.

Success Criteria

The following are measures of success of the PBS XML Schema:

Minimal code development.

May be measured by size of programs.
Developers have good understanding of schema.

May be measured by number of support requests.
Good take-up of XML.

Ideally take-up would be complete to allow text extracts to be retired.

Principles

These are the general principles that are applied to the design of the Schema.

Represent the whole PBS Schedule.

The data should be a comprehensive representation of the PBS Schedule. That is, it should be a 'closed world' view of the PBS. It should not be necessary to have knowledge of any other data or concepts outside of the PBS XML in order to be able to correctly administer the PBS.

This is not to say that the PBS XML exists as a stand-alone document. Other data, such as AMT, may enhance the PBS Schedule data or interact with it.
Minimise interpretation.

Provide data in such a way that it can be used directly; ie. the PBS XML shall be declarative. If a value must be derived from source values, then provide the derived value as a data item. This can only happen where the derivation does not depend on a variable that is only known at the point of prescribing or dispensing; eg. the DPMQ can be determined, but not the DPDQ (Dispensed Price for Dispensed Quantity) since the dispensed quantity is only known at the point of dispensing.
Separate fixed data from variable data.

The Schedule is published monthly and most of the data is subject to change from month-to-month. However, some data applies to the Schedule as a whole and is constant; eg. concepts such as "Safety Net" or "Brand Substitution". This data constitutes an ontology for the PBS. The data describing concepts shall be encoded using RDF and OWL. This allows Semantic Web technology to be used for reasoning over PBS data.

The PBS ontology shall be specified as a separate document to the monthly Schedule document, and may be made available for download. A copy of the ontology will also be supplied in the monthly Schedule document.

To support the use of an ontology both locally and remotely the IRI for concepts in the ontology must be fixed and may be distinct from the XML Namespace URI for XML elements. This allows a local copy of the ontology to be included in the monthly Schedule document, since the IRIs are identical.
Use elements rather than attributes.

All data shall be encoded as elements. This provides the most extensibility in the Schema. Attributes are only used for linking and referencing.

Where an element has a subtype or classification, then the element is linked to the Semantic Web object that defines the classification.
Eliminate optionality.

All data items are encoded and are always present.

This principle is also applied to boolean values and groups. Rather than express a boolean as being present for 'true' and absent for 'false', the element is always present and contains a value; 'true' or 'false'. For groups, an object is always a member of a group and the group is structured to have 'does apply' and 'does not apply' subgroups.
Provide alternative references as a convenience.

The PBS XML uses XLink to cross-reference elements. If the target of the link has a stable code then that can be used as an alternative cross-reference. In these situations provide both mechanisms as a convenience.

Conventions

These are the conventions that are used in the development of the element vocabulary and content models.

All element and attribute names are lower-case. Where a name consists of more than one word then the words are separated by a hyphen; eg. dispensing-rule.
Names should not be terse. Acronyms should not be used unless the full name of the term is long and the acronym is well known; eg. "DPMQ", "MRVSN".
Elements that are containers for lists (ie. one-or-many child elements of the same type) are named as the plural of the contained element with "-list" appended. For example, if the contained element is dispensing-rule then the container element is named dispensing-rules-list.

Engineering

There are a number of engineering constraints on the creation of the PBS XML document.

File size.

For the single-document rendering of the PBS XML (ie. for bulk loading), the file size should be minimised as much as possible. The following techniques may help to achieve this:
- XML Namespace prefix.
  
  For the main PBS namespace, http://schema.pbs.gov.au, use the default namespace. In a monthly Schedule PBS XML document there are over a million elements in this namespace so removing the need to specify the prefix will reduce the file size by approximately 4MB.
- White space.
  
  Don't "pretty print" the document. Removing whitespace will reduce the file size by approximately 30MB.