when i recently tweeted about the fact that the semantic web or linked data (choose your preferred brand name) had this troubling attitude of one metamodel to rule them all, i got some feedback saying that this was wrong. it's not, and i guess the reason for the confusion in that space is that data, models, metamodels, and cosmologies are somehow hard to grasp, so here is my brief attempt at displaying all of them in one handy overview (and the selection of items in that overview is definitely not complete and simply provided for illustrative reasons).
| Cosmology | Key/Value Pairs | Unordered Tables | Ordered Trees | Generalized Graphs | ||
|---|---|---|---|---|---|---|
| Metamodel | ? | Entity-Relationship (ER) |
Relational (SQL) | SGML | XML Infoset XDM | RDF |
| Model | ? | ER Graph | SQL DDL | DTD | DTD XSD RELAX-NG | RDFS OWL |
| Data | CSV Bigtable SimpleDB | ? | RDBMS | SGML | XML EXI | RDF/XML N3 Turtle |
what does this mean, in particular the weird cosmology
part? it just starts with the observation that people usually start with an application domain, and certain recurrent features/patterns in that domain they want to see supported in their models and the tools they work with. which is the reason why the hierarchical databases of the early days were replaced by the more appropriate tabular structures provided by ER and SQL. for the same reason, document processing always had its own world, because working with documents cannot be done effectively in unordered tabular structures, so this is where the trees come in. they exist in various flavors, but the main idea is that you have a tree and that the tree is ordered, which maps well to document structures.
as long as you stay within the same cosmology, mappings usually work reasonably well, and working with the data can be done in at least comparable ways. there may be differences in features and tools and languages, but the big picture of how the data is structured is the same. sometimes there are special features of metamodels, for example whether a model is even required: SQL data always needs a database to live in, whereas XML and RDF can happily live without a specific model (i.e., schema) and can still be used.
things become much more complicated if you want to take data from one cosmology, and map it to another cosmology. this is always possible, but for non-trivial data it often results in a severe mismatch between the application model and the underlying assumptions of the metamodel. it also often means that the tools and languages in the new environment are not appropriate anymore. imagine representing a complex XML document in RDF (i am sure somewhere there is some RDFS for representing XML). it can be done, sure, but compared to the tools provided with XSD and XQuery and XSLT, working with that data will become much more complicated and very ineffective (and also probably very inefficient in terms of processing times when considering large document collections), because the inherent ordered tree structure of the data is not supported by the underlying metamodel anymore.
some people may think that because it is always possible to define mappings between any metamodels, that implies that there is one overarching metamodel that spans all of the above cosmologies (as a layer on top of all cosmologies), and since RDF is the metamodel with the fewest built-in constraints, the claim is often made by the semantic web community that RDF can be that one metamodel. however, this ignores the fact that there is a reason why there are different cosmologies and metamodels and model languages and tools: these have evolved over time to deal with certain classes of data, and they usually do a good job in their domains. RDF has had success because it, too, deals with a certain class of data, it is typically being used for metadata. metadata, even though it's only loosely defined, typically is not data with complex structures, and thus using a simple model such as RDF works well.
the motivation for this post is simply to show that RDF is not the metamodel to end all metamodels. it is one metamodel, and simply a new addition to the existing multitude of cosmologies, metamodels, and models that have been around for a long time, and it has found an application area for which it is well-suited. claiming that its simplicity (i.e., the ability to map other cosmologies to the RDF cosmology) means that all data out there can be appropriately represented and handled as RDF ignores the fact that in the end, it's not important that it's possible to do something, but only how effective you can be when you're doing it: if your job is processing large collections of documents, what models, languages and tools make you most productive at getting your job done.