A little while ago (almost three years), Mark Nottingham wrote his excellent article JSON or XML? Just Decide
, and it seems that it's time to update that a little. But it also is a good opportunity to put it in context, and see how the context has changed at the surface, but not in nature.
XML was already trending downwards, and most people liked JSON better. To me, the main reason for that was always easy to see: XML is a document-oriented language and pretty good at being one, but its metamodel is a bit hard to grasp. JSON, on the other hand, maps very directly to the mental model of what most developers think structured data should look like, and therefore it makes them happy and productive.
During the transition phase from XML as the first big language for structured data on the web to JSON, many API designers were unsure what to choose. XML because of its established status, or JSON because it seemed to be trending and a better fit for most API data models?
The worst possible decision (and the one Mark was blogging about) was to avoid a decision and assume that you can do both. Mark nicely describes the problems, and it's worth to notice that the problems do not have anything to do with XML's or JSON's inherent strengths or weaknesses; they simple originate in the metamodel mismatch. Every metamodel has idioms and built-in bias, and when you try to make two masters happy, you end up in one of the bad situations Mark pointed out.
This gets much more pronounced when you are not just defining one relatively stable API under your control, but something that is supposed to be open and extensible. The reason for this is the same: Metamodels have their own ways of defining and allowing and encouraging extensibility, and these are often even more idiomatic and specific than the metamodels themselves.
In the end, Mark's message was to better not use XML, because it is tricky and developers try to avoid it anyway and then throw data binding tools at it, resulting in brittle code on their end (but they will still blame you when their code breaks).
But: It seems we're heading down a similar path now with JSON and RDF, and people claiming that you can safely do both at the same time. This time it seems that people think that JSON-LD is the magic that allows them to not decide on their metamodel. This is too much of a burden for JSON-LD, and more than it has been designed for or can live up to.
Simply said, JSON-LD is data binding for people dealing with JSON data who prefer to have an RDF view of that data. It does a good job of allowing people to perform this mapping, in the same way as XML data binding tools isolated developers from the XML they did not want to process themselves.
In the end, if you want to create a robust model, it is impossible to avoid a decision which metamodel you want to build it on. If you try to ignore openness and extensibility hard enough, you may get away with it for a little while. But it will catch up with you eventually, and in particular in situations (such as open standards) where people will take your foundation and stretch it to its limits in all possible directions.
A clean way I would recommend given our experience with how metamodel trends change over time, and APIs are being used and extended, would be the following:
- Start with JSON as your foundation, defining your model in ways that are easy to read and understand and implement for developers. The upcoming work on GeoJSON will be a good exercise in this. In particular, be clear about openness and extensibility, so that implementations know what to expect now and in the future. This also allows them to drop/fail when those rules are not followed.
- Your extension model must clearly say how extensions are supposed to be exposed to applications, so that implementations can be build appropriately.
- If there are developer communities invested in other metamodels, have a completely separate layer for them, where they can build data binding in any way they please. For GeoJSON, that's GeoJSON-LD (which happens to use JSON-LD to map into RDF), and if somebody felt the need to bind GeoJSON to XML, they would be more than welcome to for example use JSONiq to map GeoJSON into some XML model.
- Have plenty of test cases that explore the limits of your JSON model, and the limits of your extension model. Document clearly for each test case how it is using the extension model, and what an application is supposed to see.
- Have generic applications consuming your test cases, including those that implement the data binding layer(s). Make those clients produce JSON syntax out of what they get reported through the data binding layer, and compare this JSON to the original test data. Nothing should get dropped.
This probably sounds a bit complicated. But it really isn't all that hard to do, and should you decide to use a clean layering structure such as the one proposed here, then doing this kind of testing is necessary to avoid specification errors anyway.
But, whatever you do, do not try to serve two masters at the same time. Layering is a well-established pattern in software and protocol design, and there are good reasons for this. Trying to make two communities happy in this case will make none of them very happy, and very likely end up with a product that's neither robust nor evolvable.
Comments