RFC 6906 defining "The 'profile' Link Relation Type" has been published a little while ago and I discussed it briefly when it was published. It has been picked up in a variety of places, and seeing where it is being used and how, it occurred to me that maybe some clarifications and explanations are in order. Like most link relation types, the type is defined in relatively fuzzy terms, in an attempt to make it generic enough to be reusable, but still specific enough so that there is some commonality in what it is used for.
The idea of profiles originates in the idea for safely evolving formats and services, as explained in more detail in "Patterns for Robust Extensibility". The assumption is to start from a meaningful core model, which is an extensible format or service. A profile then identifies one specific way of extending it, in the sense that the profile-identified extension still adheres to the core semantics, but adds constraints and/or extensions that go beyond it. One important aspect of profiles is that they must always be permissible to ignore the profile and just use the core semantics, and still something useful can happen.
One example used in RFC 6906 are podcasts, i.e. feeds for multimedia content. While you can look at these in a regular (i.e., non-podcast-aware) feed reader and get useful information out of them, a podcast-aware client can be more helpful and for example display available media metadata such as media running time.
An argument can be made (and has been made in a variety of ways) that because a podcast is a feed, no profile identification is necessary. Instead, if a client supports podcasts, then it can simply look at a feed and figure out whether it satisfies all the additional podcast constraints, and if it does, then it can be treated as a podcast, without being labeled as one. This is what in programming is called duck typing: If it walks like a duck and quacks like a duck, then in all likelihood it is a duck.
The duck typing argument can be applied to profiles: If a feed can be extended to carry all podcast-specific data, and still be treated as a regular feed, then why go through the additional effort of calling it a "podcast" and making it identifiable?
There is nothing inherently wrong with that argument, because in a well-designed ecosystem of a meaningful and extensible core formats, and existing extensions to it, the evolution of producers and consumers is loosely coupled (that is the purpose of the whole enterprise after all), and everything can be treated at runtime as it is encountered.
So what are the arguments for going beyond that and to use profiles and related mechanisms? There are two main reasons:
- If applications encounter extensions, then how do they find out what those mean, if they do not have built-in knowledge of them? One approach to do this is by using a registry, where extensions can be registered along with some metadata. At the minimum (and often as the only thing), that metadata links to available documentation, so that any extension that is encountered can be traced back to the place where it is defined and explained. This usually is not done at runtime, but is a robust mechanism to be able to track the ongoing evolution of the extension ecosystem, and update applications as required.
- Going one step further: If you define a coherent and meaningful set of extensions, then you might want to give this set a name and thus make it identifiable. The reason for this is quite simple: If you want to have meaningful conversations about what to produce or consume, then you must be able to identify it. By creating a "property bag" and calling it "podcast" (and let's say also make it self-describing via a profile URI), you can talk about feeds being podcasts, for example in conversations where you can then say "I can deal with a variety of media feed formats, but prefer to use the podcast variety".
Essentially, only when you are able to call a duck a "duck" can you ask somebody to please hand you a duck. This might or might not be something that matters in your service scenario. But if it does, then going beyond the duck typing approach and creating a self-describing label for a "duck" may be useful. And in that case, profiles as defined in RFC 6906 might be a useful way of doing this instead of having to reinvent that particular wheel.
Thanks for this posting, Erik. But it leaves me hungry in two ways:
First, the title suggests that the posting would explain when the use of "profile" defined in RFC 6906 is appropriate and when the use of "type" defined in RFC 6903 is. But it doesn't. By the way, given that the "type" attribute has been used since the dark ages to convey MIME type, I am baffled that someone was able to register "type" as a link relation. That's begging for confusion. But that's another story.
Second, I had hoped for an explanation as to why the use of e.g. an XML Schema URI as the Target IRI of a link with the "profile" relation type is or is not appropriate to provide additional information about a resource with media type "application/xml". A recent poll you did via Twitter seemed to suggest it is not appropriate. And, honestly, having read RFC 6903 several times, I can't figure out why that would be. As far as I can tell, a schema expresses a constraint.
I look forward to your feedback, especially because I am involved in a project that really needs solid information re the above questions.
Cheers
Herbert
Posted by: Herbert Van de Sompel | Wednesday, May 04, 2016 at 17:41
Thanks for your comment, @herbert. Here's my response, but as usual, that's just my view of things, and very often it's more productive to discuss concrete scenarios and applications instead of just doing spec exegesis. But since i have written the one in question, that's what i was indulging in with this post...
1) Have you seen "type" being used for indicating a MIME type? That would be terrible, since that's definitely not what it's for. I have not seen it used much, and the spec is very fuzzy and talks about linking to an "abstract semantic type": https://tools.ietf.org/html/rfc6903#section-6
i am not sure what that's supposed to mean, and one of the REST principles is that the "type" of a resource in indicated by its MIME type, period: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven "A REST API should never have “typed” resources that are significant to the client. Specification authors may use resource types for describing server implementation behind the interface, but those types must be irrelevant and invisible to the client."
The idea behind profiles is that they are optional hints: things continue to work without them, they are just convenience labels you might use to refer to a certain "property bag" that you use or expect to be used. That's very different from a type.
2) A profile says "This animal says you can treat is a duck". A schema says "here are all the rules an animal needs to follow to be considered a duck". A profile might have a schema, but very often instead of that, the schema governing it will be that of the underlying media type, plus additional constraints which often are extensions.
Consider the podcast: The podcast spec does not provide a schema replicating all feed constraints and adding the podcast fields. It could, but that's not what I would do and frankly, I have never seen it done that way. Instead, the schema governing the podcast is still whatever schema defines a feed (let's assume the RELAX NG for Atom), and the profile simply identifies the set of properties that's used in addition to that schema.
As a result, a podcast, when linked as extensively as possible could link to its schema which would be the vanilla feed schema, as well as its profile, which would be the podcast feed identifier URI (which is just an identifier and not a locator, per spec).
Posted by: dret | Thursday, May 05, 2016 at 10:33