« Atom Content Negotiation | Main | 5th International Workshop on Web APIs and Service Mashups (Mashups 2011): Call for Papers »

Tuesday, May 03, 2011


Feed You can follow this conversation by subscribing to the comment feed for this post.

Ryan Shaw

This sounds right to me. Every time I have done something non-trivial with Linked Data I have ended up working with local copies. Lately I've been wondering whether I even need to bother with RDF and SPARQL. Often all I really want to do is build graphs and query and analyze them in various ways. I can do this in a triplestore, but then I have to convert to RDF if I have some non-RDF data. Often data is in the form of a graph but isn't RDF. So why not just stick RDF and non-RDF-yet-graph-structured data alike in a graph store? And then I can write code to do the queries / analysis I want and don't have to worry about SPARQL.


Three short comments:
- this blog is the first time, where I read business intelligence & anayltics properly linked with the linked data environment. The key is here to me, that you can apply (quite easily or "not too far away" in terms of timing) traditional bi techniques to linked data contents. The much exciting stuff is to apply broader analytics, which would mean "generate new information based on linked data contents".

. Maybe a key to opening up linked data contents to traditional BI systems would be to create a LinkedDataOLAP-Adapter (an idea I had when looking http://www.simba.com/olap-sdk-features.htm ): this would enable instantly much higher adoption rates, as business users would access linked data contents without even leaving their desktop & application environment. (and without knowing that they used 'rdf-coded' information ;-)

- You cited @cygri's statement re. local copies of triples: that's today's state of technology. I still hope that we get to something like "federated sparqling", where the querying person does not have to know where statements about a resource are made. Today the infrastructure-wise effort to recreate the most important triple sources for your own use is too high, and the publicly & freely available infrastructure does not match up - today - to requirements as you have them in production environments)

A wonderful blog post, indeed,

(re. Ryan: basically right, but I believe in the technical advantages & beauty of SPARQL: but I do not believe that e.g. a large share of business analysts will ever learn sparql, therefore we should make connectiong with linked data sources easy in todays (web) applications, and we'll likely have to hide sparql queries behind nice UIs to get acceptance)


Business Intelligence and Linked Data are connected subjects in our world view re. Giant Global Graph of Linked Data. Intelligence is a function of being able to access and make sense of data across disparate data sources. This applies to individuals and their social networks just as it applies to same individuals within enterprise intranets. Of course, this also applies to enterprises (organizations which are Agents too).

No matter what moniker we apply to the subject matter in question, the fundamental value boils down increased Agility by surmounting the inertia of data silos. This is indeed the essence of the mater re. Linked Data since it facilitates data virtualization across heterogeneous data sources via a mesh of distributed data objects.

Key thing to note:

1. none of what I state is unique to RDF. RDF is simply an option. The power comes from URIs and EAV based linked data graph representation

2. reasoning is important and mandatory, but you need the linked data substrate in place first for the "sense making" prowess of reasoning to surface.

Some links to posts and resources that cover Linked Data and Business Intelligence (BI) from the past:

1. http://www.openlinksw.com/weblog/public/search.vspx?blogid=oerling-blog-0&q=business%20intelligence&type=text&output=html -- collection of posts from Orri Erlings blog data space

2. http://www.delicious.com/kidehen/virtuoso_sparql_tutorial -- collection of tutorials oriented towards making sense of data

3. http://ods.openlinksw.com/wiki/main/Main/VOSArticleBISPARQL2 -- business intelligence extensions for SPARQL (SPARQL-BI) .



@ryan, if all you want is graph storage and processing, RDF and SPARQL indeed may be a bit of an inconvenience. but if your to-be-graphified data has nodes that are identified by URIs, and might reuse certain relationship that also can be identified by a common set of identifiers (such as URIs), and if you want to have some built-in mechanism to separate namespaces and subgraphs in the graph, then RDF and SPARQL already may look much more appealing.

so i think for general graph processing you're right that RDF/SPARQL might not be such an excellent model and tool set, but for datasets that are derived from (more or less well-designed) web-centric datasets and services, i think it does have quite a bit of appeal. it still is painful to use for data that has more of an ordered tree angle to it or a regularly structured table angle, and i think these are the areas where both RDF and SPARQL might need a bit of evolution (or layered technologies) to make life a bit easier.


@dakoller: federated SPARQLing is an active research area, but because this is very hard to get right and efficient (as decades of research in federated databases have demonstrated, which basically never got to anything functioning and reality turned to data warehousing as an ugly but practical approach), i wouldn't wait for it. you might get it working when expanding the "let's SPARQL a single RDF silo" approach to a "let's SPARQL a couple of RDF silos" approach, but if it should really work at web scale, you essentially need to build real-time crawling into the infrastructure you're building, and that is, to put it mildly, maybe not all that easy to get working.

but the point is that as with data warehousing, SPARQLing silos already delivers a lot of value, and as long as the data has some backlinks to its origin (conveniently provided by RDF), i am sure even in this a little less elegant setting there is a lot that could be done. i really like the RDF/OLAP idea, maybe these are two communities that really should talk to each other a whole lot more.


re. Linked Data converging on a small set of vocabularies - although reuse of terms is desirable where appropriate (e.g. dc:title instead of my:title), to work in a global environment covering millions of different subject domains of varying levels of specialization, a large set of vocabularies is essential. I reckon in this context there is a definite role for some level of inference, like subclass/subproperty reasoning to tie those vocabularies together down through specializations. Otherwise we either compromise the potential utility of the data (by sticking with over-generalizations) or risk continuing the disconnect that the current one-API-per-site setup incurs.

re. local stores, federation etc - although a lot of the infrastructure kind of specifications are in place, we're still a way off being able to *fluidly* obtain, manipulate and republish data in practice. Even if everyone decided to publish RDF overnight (as distributed LD or in SPARQL stores), there'll still be a gaping void out there which needs filling with intermediary services and end-user tools, to make the stuff really useful. This stuff takes time :)

The comments to this entry are closed.