RPC and SemWeb have the same implicit goal of hiding distribution
that somehow goes
against the grain of RESTful loose coupling
was an observation i recently voiced on twitter, and that might need a bit of explanation. for now, i'll focus on the semantic web part of that, but it seems to me that this general approach of trying to hide distribution is amazingly similar in both the WS-* world and the semantic web. in both cases, this makes it hard to think of how to get them better aligned with basic web principles. to stay a bit more focused, i will concentrate on RDF for the purpose of this post.
RDF is great at making statements about URI-identified things, because this is what it is all about. build graphs of things that attach, by reference, to anything that can be identified by URI. how to identify RDF triples and graphs themselves is an entirely different issue, though. RDF's abstract model leaves them dangling in the void, and while SPARQL introduces the concept of named graphs (effectively turning triples into quadruples), these names are basically namespace URIs
, which means they identify named graphs, but there is no standardized way of how to interact with them (they are never dereferenced). so this means that in a typical RDF store, triples are not identified at all, and even if they are (let's say somebody manages their RDF store in a way which guarantees that every triple is associated with exactly one named graph URI), there is no well-defined interaction with those triples.
this all starts with the basic approach of many semantic web projects: people want to have and use information about things on the web, or they start with the assumption that others have information about things on the web, and then they can harvest
that information. this terminology is quite interesting, and if we want to get all metaphorical, then this is what harvesting is all about: you cut stuff, separating it from its roots, and you end up with a lot of useful harvested stuff, but it's sort of dead and you will never be able to reconnect it with where it came from. it may give you useful seeds for new things as well, but again, those things will only be very loosely connected to where they came from.
so, on the semantic web, it's all about getting many RDF triples, dropping them into one big silo (which is why you see all these statements about we have converted such-and-such into x billion triples!
), and then working with that silo (maybe providing one tiny hole in that silo through which SPARQL queries can be funneled). this approach makes it easy to abstract from the fact that all the information started out from different sources (and probably still lives there and may be exposed via live services), but it also makes it hard to manage things such as provenance and authenticity (which are critically important for distributed and federated systems). as patrick murray-john pointed out in his comment on my recent post about Data, Models, Metamodels, Cosmologies
, RDF's biggest strength is its excellent capability to mix data together. however, this seamless blending of data from different sources gets problematic when you look at REST, which is built around the general idea of coarse-grained interactions with resources identified by URI.
so when you are treating semantic web work as the process of harvesting data, and then using it locally in some form that allows read access only and loses track of the identity and the boundaries of the original data sources, then you're not working with a web of data, but you are just working with data you harvested from the web. this assumes that the harvested data is static, and that any updates to the harvested data will not have to be propagated to where it originally came from. essentially, such an approach treats the web, in an idealized world view, as one huge RDF store, assuming that the process of collecting all the data is taken care of, and that there is no channel of how to interact with the original data sources. and this idealized world view is reflected in many projects around the semantic web, where it is assumed that problems of distribution are taken care of, and can be regarded as solved. as a side note, i think this is why there recently has been such a huge uptake of semantic web work in the government data space, because the government can be conveniently viewed as one homogeneous data source, providing one big collection of RDF data that can be accessed through centralized services. many of the problems of federated data sources, conflicting data, changing data, data authenticity, data integrity, and heterogenous data sources and services can be ignored in such a centralized scenario.
the current work around SPARQL Update does not look all that different. it looks at how to allow the full spectrum of CRUD operations for SPARQL implementations (in the same way as XQuery progressed from a read-only language to one with updates), but it still follows the same approach of looking at RDF data as living in one big silo, and standardizing a language for how to interact with that silo. it's important that such a language exists and i think SPARQL and the Update extension to it are very important specs, but this language has as much to do with the web as SQL or XQuery: not all that much. it's back-end stuff, and for me as a client of a web service, all i want is a way of how i can interact with that service through HTTP so that i can use that service effectively; whether the back-end of that service uses SQL or XQuery or SPARQL or a column store does not really interest me all that much.
so what i am currently doing (collaborating with linked data expert michael hausenblas) is trying to figure out a way how to make SPARQL more REST-friendly, so that SPARQL can be easily used as a back-end to provide RESTful services on the web. i am looking forward to our work of making SPARQL more REST-friendly, but it is interesting to me how that turns into an exercise of ignoring the subject URIs of triples as actionable URIs, and instead using the named graph URIs to give triples an identity on the web, so that they can be exposed in a RESTful way. in such a RESTful SPARQL world, it will be possible to GET a bunch of triples from somewhere, and then to PUT the exact same bunch of triples to a URI that identifies a named graph that needs to be updated. it seems to me that this kind of RESTful approach will make it much easier to expose meaningful resources on the semantic web, and to allow RESTful interactions with them.