Technology

Proposing a data format

A data format specifies the representation of data as it’s exchanged.

It’s important to balance between a format that is a good fit for your exchange patterns, and one that your community can support.

Choosing your preference

What format suits your needs

While all formats can represent any type of information, they do have different characteristics that change how well suited they are to different types of problems.

Here is a very simplified comparison of some prominently used formats, and how they perform in different circumstances. A much more detailed table is at the end of the page:

  JSON XML CSV XBRL
Large messages Not really. Use high volume individual messages instead Quite good if streamed, or with dedicated hardware Quite good Large in-memory overheads and is very verbose
High volume Very low processing overhead. JSON is used by internet giants like Facebook, Google & Twitter Moderate processing overhead, but can be streamed. Specific hardware is available to process very large volumes File based structure. Not suitable High processing overhead
Industry support The industry’s preferred data format Globally supported Available everywhere, but very rigid and inflexible Very small and niche community. Very little commercial or open source tooling available
Self describing Through alternate design-time mechanisms like JSON Schema and Swagger Capable of describing complex, simple and primitive types with metadata Very limited, column based Capable of describing attributes, facts, units and types to a very granular level

What formats can your tools support

It is important to also consider what formats are supported by the tools you have.

Some tools support multiple import and export formats, others only support one.

Most tools will allow for some customisation of their formats, but generally only to very prominent formats.

Choosing a format that doesn’t integrate well into your tools can mean very long and costly bespoke development.

What direction are you going in

Citizens and businesses are increasingly interacting online, and from mobile devices. Choosing a format that was expressly designed for simplicity with low overheads, like JSON, might make it easier for people to develop innovative solutions.

On the flip side, choosing a format that expressly prevents online, mobile or cloud use might inhibit innovation.

A very detailed comparison

Here is a breakdown of how each format handles a range of commonly needed tasks.

Just because a technology isn't 100% supported doesn't mean it's not suitable, but it might cost you and your community more.

This list isn't an authoritative source, but it is based on our experiences with the technologies. Let us know if you disagree with our findings, and we can talk about changing things.

JSON
XML
CSV
XBRL
Business Rules
Not Supported
Supported
Not Supported
Varies
Standardised rule specification
No standardised support
Several technologies, such as XSLT
No standardised support
XBRL Formula isn't well supported. Moderate support with other XML languages like XSLT
Rules can be generated into executable code
No standardised support
Several technologies, such as XSLT
No standardised support
XBRL Formula isn't well supported. Moderate support with other XML languages like XSLT
Phased execution of rules
(eg: errors, then warnings, then info)
No standardised support
Yes, using ordered XSLT files
No standardised support
Yes, if using XSLT
Message Correctness
Supported
Supported
Not Supported
Supported
Documents and be checked as well formed
JSON Schema
XSD or DTD
No standardised support
XBRL Taxonomies
Versioning
Varies
Supported
Not Supported
Varies
Can compare between versions
Text-based diff will work depending on the structure. JSON native diffs are available
XML differencing tools are available
Standard diff tools
No standardised tools. XBRL Versioning is not widely implemented
Versioning can identify breaking changes
No standardised support
Yes, but requires SemVer
No standardised support
No standardised support
Schema
Supported
Supported
Not Supported
Supported
Can specify a schema for the document
JSONS supports this
XSD supports this
No standardised support
XBRL Taxonomies support this
Can specify all valid combinations
JSONS supports this
XSD supports this
No standardised support
XBRL Taxonomies support this
Schemas are machine readable
JSONS supports this
XSD supports this
No standardised support
XBRL Taxonomies support this
Schemas can be validated
JSONS supports this
XSD supports this
No standardised support
XBRL Taxonomies support this
Automation
Supported
Supported
Not Supported
Not Supported
Can generate code from the document
Swagger / Open API specification
Very mature
No standardised support
No standardised support
Transformation
Varies
Supported
Not Supported
Varies
Can be automatically transformed into other formats using generic transformatinos without customisation
Yes, but isn't as mature as XML
Yes
No
Possible, but not very mature
Open Standards
Supported
Supported
Supported
Varies
Open and not proprietary
Yes
Yes
Yes
Yes
Driven by the community
Explosive popularity with software industry
Waning, but still very popular
Yes
Driven by tool vendors. Limited support of standards outside of vendors who define them
Tooling
Supported
Supported
Supported
Varies
Tools are available to create and consume
Yes. Also simple enough to use with standard text processing
Yes
Yes
Very limited tooling exists
Free and commercial options
Yes
Yes
Yes
Very limited tooling exists
Available on a range of platforms
Yes
Yes
Yes
Limited support. Not available for mobile
Alternatives are available if tooling can't be used
Yes
Yes, with difficulty
Yes
Very difficult to work with if not using specific tooling
Industry Support
Supported
Supported
Supported
Varies
People with skills are available
Very
Very
Very
Very limited
Freely available training is available
Lots
Lots
Lots
Very little
A community body drives best-practice
Yes
Yes
Yes
Efforts have been made, but haven't progressed
Frameworks
Supported
Supported
Varies
Not Supported
Well integrated into software frameworks
Very
Very
Yes, but with limited support
Not at all
Large Messages
Not Supported
Varies
Supported
Not Supported
Can support creating and consuming large messages
Yes, but was designed to be fast, light weight and high volume
Yes, with streaming
A de-facto standard for large messages
Very poor, with huge memory overheads. A draft recommendation has been prepared to allow for streaming, but has no available implementations.
Self Describing
Supported
Supported
Varies
Supported
The data can be interpreted without needing to read other information
Yes
Yes
Yes, if using column headings
Yes

Some example formats

Some formats are simple, others are more verbose.

As you can see in the below examples, all data formats are capable of representing the same information.

You should choose the option that was designed to solve the types of problems you have.

JSON

{"employees":[
    { "firstName":"John", "lastName":"Doe" },
    { "firstName":"Anna", "lastName":"Smith" },
    { "firstName":"Peter", "lastName":"Jones" }
]}

XML

<employees>
    <employee>
        <firstName>John</firstName> <lastName>Doe</lastName>
    </employee>
    <employee>
        <firstName>Anna</firstName> <lastName>Smith</lastName>
    </employee>
    <employee>
        <firstName>Peter</firstName> <lastName>Jones</lastName>
    </employee>
</employees>

CSV

"firstName","lastName"
"John","Doe"
"Anna","Smith"
"Peter","Jones"

XBRL

  <xbrli:context id="c0001">
    <xbrli:entity>
      <xbrli:identifier scheme="http://www.abr.gov.au/abn">1234567890</xbrli:identifier>
      <xbrli:segment>
        <xbrldi:explicitMember dimension="RprtPyType.02.06:ReportPartyTypeDimension">RprtPyType.02.06:ReportingParty</xbrldi:explicitMember>
      </xbrli:segment>
    </xbrli:entity>
    <xbrli:period>
      <xbrli:startDate>2013-07-01</xbrli:startDate>
      <xbrli:endDate>2013-07-01</xbrli:endDate>
    </xbrli:period>
  </xbrli:context>
    <pyde.02.00:PersonNameDetails.FamilyName.Text contextRef="c0001">John</pyde.02.00:PersonNameDetails.FamilyName.Text>
    <pyde.02.00:PersonNameDetails.GivenName.Text contextRef="c0001">Doe</pyde.02.00:PersonNameDetails.GivenName.Text>
  </prsnstrcnm1.02.00:PersonNameDetails>

    <xbrli:context id="c0002">
    <xbrli:entity>
      <xbrli:identifier scheme="http://www.abr.gov.au/abn">1234567890</xbrli:identifier>
      <xbrli:segment>
        <xbrldi:explicitMember dimension="RprtPyType.02.06:ReportPartyTypeDimension">RprtPyType.02.06:ReportingParty</xbrldi:explicitMember>
      </xbrli:segment>
    </xbrli:entity>
    <xbrli:period>
      <xbrli:startDate>2013-07-01</xbrli:startDate>
      <xbrli:endDate>2013-07-01</xbrli:endDate>
    </xbrli:period>
  </xbrli:context>
    <pyde.02.00:PersonNameDetails.FamilyName.Text contextRef="c0002">Anna</pyde.02.00:PersonNameDetails.FamilyName.Text>
    <pyde.02.00:PersonNameDetails.GivenName.Text contextRef="c0002">Smith</pyde.02.00:PersonNameDetails.GivenName.Text>
  </prsnstrcnm1.02.00:PersonNameDetails>

  <xbrli:context id="c0003">
    <xbrli:entity>
      <xbrli:identifier scheme="http://www.abr.gov.au/abn">1234567890</xbrli:identifier>
      <xbrli:segment>
        <xbrldi:explicitMember dimension="RprtPyType.02.06:ReportPartyTypeDimension">RprtPyType.02.06:ReportingParty</xbrldi:explicitMember>
      </xbrli:segment>
    </xbrli:entity>
    <xbrli:period>
      <xbrli:startDate>2013-07-01</xbrli:startDate>
      <xbrli:endDate>2013-07-01</xbrli:endDate>
    </xbrli:period>
  </xbrli:context>
    <pyde.02.00:PersonNameDetails.FamilyName.Text contextRef="c0003">Peter</pyde.02.00:PersonNameDetails.FamilyName.Text>
    <pyde.02.00:PersonNameDetails.GivenName.Text contextRef="c0003">Jones</pyde.02.00:PersonNameDetails.GivenName.Text>
  </prsnstrcnm1.02.00:PersonNameDetails>

Back