bob ducharme's recent post about WWW2009 tutorial talking about the route from SOA to REST (to be repeated in june at ICWE 2009) made me think about the fundamental tension between RDF-based semantic web approaches, and RESTful architectures.
it seems to me that the biggest problem of RESTful RDF is the lack of a well-defined notion of a document. one of the strengths of RDF is that you harvest triples from wherever you find them, take the graph, compute the closure of this graph under whatever schemas or ontologies you might have, and then process the resulting graph. on the semantic level this means you take whatever you can get as data, whatever you can get as reasoning mechanisms, and extend the data using inference. it is this general model of raw data and derived information that makes RDF so powerful.
the price you pay, however, is the loss of provenance and document boundaries. of course this can be overlaid using various mechanisms, but it is not a fundamental part of the RDF model, and it's easy to forget it or get it wrong.
this leads to problems when it comes to the rather document-centric model of REST, which talks about representations that need to be exchanged for enabling interactions. how do you use RDF, when the data you're working with for your application purpose is a mix of retrieved RDF and inferred information?
XML, on the other hand, has a rather static idea of a document. well, that's actually not entirely true, because XSD introduced the concept of the Post-Schema Validation Infoset (PSVI), which in a way is the same mix of source data (the original XML document) and inferred data (validation information, type annotations, and default values). type annotations can be regarded as a bonus that can make data processing much more robust and reliable. ironically, default values, however, are widely frowned upon in XSD best practices, because they tightly bind document processing to document validation, and that is largely perceived to be a Bad Idea.
and this is where it gets interesting: XML's very idea is that of enabling applications to GET XML and blindly process it by simply processing well-formed markup. RDF, on the other hand, is more rooted in the idea that the RDF you find blends into an existing set of other triples and schemas or ontologies. this latter model makes working with RDF more seamless and more convenient in an RDF-only world, because essentially, applications can ignore the fact that data had different origins, and they can work on what sometimes has been referred to as the giant global graph. it seems to me that this replicates the pattern of trying to
hide distribution that underlies many of the middleware-inspired approaches to web services in a rather interesting way.
XML, on the other hand, makes it almost impossible to get this seamless view of the world, and requires applications to handle one XML document at a time, often steered by overlaid mechanisms such as XInclude at a very low level, or by following hyperlinks when it comes to RESTful data formats. what this means is that XML and this RESTful picture force you to work on the document level, and this also forces developers to explicitly deal with distribution, which means dealing with failures at the network or representation level.
so it seems as if RDF's virtues, the ability to transcend document boundaries and to seamlessly include data from all kinds of sources, become problematic when moving towards RESTful services, where coarse granularity and well-defined document concepts are an essential part of the architectural style.
this does not imply that RDF cannot be used for RESTful services and document-oriented scenarios; it simply means that XML makes it more natural (or you might say: less avoidable) for developers to deal with strict document boundaries, whereas in RDF, this requires special attention. i am really curious to find out whether somebody already came up with a set of best practices and design guidelines for building
RESTful RDF applications, and this becomes particularly interesting when looking at issues such as SPARQL (and its basic model of accessing an RDF triple store through just one URI) and updates of RDF data.
it seems to me that the road towards a RESTful semantic web (if we take the RDF-centric view of that concept) is still not fully clear, and i am really curious to explore in more depth the fundamental tension between REST's
things are diverse and distributed and you have to deal with it, and RDF's
in the end it's all just triples world views.