« Querying Feeds | Main | Ionos vs. Atmos »

Saturday, March 22, 2008

Models and Schemas

what is the relationship between a model and schema(s)? models should capture the logical structures of an application scenario, whereas schema(s) describe the physical structures of how data is serialized, with the obvious limitations of XML, for example that everything has to be a tree.

here i want to discuss a certain aspect of how to map models to schemas: how to map the depth of a model to XML. let's assume a model has customers and employees, and both are derived from an abstract class person. this means there will never be instances of the person class, but there will be customers and employees which share the fact that both are persons. how can this be encoded in XML? in various ways of course, here are some variations (and i have seen all of these in real applications):

  • encode the common ancestry in the element name, so that there are <person.customer> and <person.employee> elements. this makes it necessary to parse element names, and the model information is encoded in a rather awkward way. this gets very cumbersome for more than two levels of hierarchy, and does not work at all for multiple inheritance.
  • encode the common ancestry in an attribute, for example using <customer type="person"> elements. this treats model information as metadata and does not work well for more than one superclass.
  • encode the common ancestry in a subelement, for example using <customer><type name="person">, which again treats model information as metadata. this works with multiple inheritance and can even handle multiple levels, if lists of names can be used.
  • encode the common ancestry in the element name and the specialization in an attribute, such as <person type="employee">. this makes validation in many schema languages very hard, and again treats model information as metadata. this does not work for multiple inheritance or more than two levels.
  • create some model information XML which can be used to augment instance data and has to be consulted if applications want to find model relationships between instances.
  • use schema types, for example XML Schema (XSDL) allows type derivation, so the schema could define a type personType, and use this to derive types employeeType and customerType. this works, but only within the pretty narrow limitations of what type derivation allows in XSDL.

i agree that some of these examples look weird, but the question is: what is the best solution? and this answer of course depends on how quality is measured. if the scenario assumes that instances are just serializations that will be used in environments where the full model is known, then why bother at all?

if, on the other hand, the schema should support XML-based applications which might not be able to use the full model, it might be useful to expose the type relationship on the XML level. in all but the last two cases, the markup uses conventions which must be documented, so that users of the XML understand that these markup constructs represent model information. in the second last case an ad-hoc model representation language is created which describes model level information which should be available to XML tools. let's assume that users of the XML understand the markup conventions or the model representation language.

if they do understand this model information, how easy is it to write code that works with the XML? how easy is it to write code finding all persons? all employees? how easy is it to find out which other specializations have been derived from the person? how easy is it to find out what content is allowed for an employee? it is really interesting to look at these questions in terms of the most common ways of accessing XML data, for example DOM and XPath.

This question of accessibility is interesting: even though the XSDL solution (the last one) uses an XML standard to represent the original model, there are no standard interfaces that allow developers to actually access that information. there is no standardized API for XSDL, and there is no other supported way of accessing the components (i.e., the data model) of an XSDL schema. so even though the last solution at first sight looks like a better solution than the other ones, it actually might be worse, because it encodes the model in a schema which is not accessible through standard XML technologies.

while this example is very simplistic, it already demonstrates that it can be hard to represent the full depth of a model in a schema. one way to look at that could be to say that a schema only represents the surface of a model, by which i mean the actual data that represents an instance, but not necessarily all the details of the model which put that instance into context.

i am still struggling with how this depth vs. surface issue could be investigated more systematically, but i think that this perspective could be a useful guideline for how applications should expose their data in XML: aim for a well-designed surface view of the model and accept a certain loss of deeper understanding. by making a conscious decision of what should be part of the surface, it is easier to understand how pure XML developers (i.e., those without access to the full depth of the model) will be able to access instance data.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Flickr

Twitter Updates

    follow me on Twitter