Recently, W3C published the first working draft of a protocol called PubSub. It is on the W3C TR track and thus intended to become a W3C recommendation. While the name is new, it is the protocol that previously has been know as PubSubHubbub (PuSH). Before Adopting it, W3C rightly assumed that this name may not be such a great naming choice. PubSub also may not be such a great naming choice, but that's a different issue that was discussed in a previous post.
PubSub is a protocol for a publish/subscribe mechanism, but instead of covering all aspects of this picture, it only specifies the interactions with the PubSub hub. The high level picture looks like this:
- Some service publishes a topic, which is some sort of feed or stream of entries about that topic. Typically, on the web such a topic would be consumed in a polling fashion, with clients polling for updates, and thus running into the typical problems of polling (balancing polling speed and update speed for the best combination of resource consumption and delay).
- A consumer may be interested in receiving pushed topic updates instead. Such a service may be advertised by the topic publisher (linking to the hub that provides a push service), or the consumer may learn about the hub through other means.
- The consumer subscribes to the topic at the hub, with the subscription being a combination of the subscribed topic, and a 4. callback URI* where the hub is supposed to push updates. The consumer has to verify the subscription by responding to a challenge sent by the hub.
- When the hub learns about a topic update (either by the topic publisher notifying the hub, or by other means such as through polling), it pushes the update to all subscribers of the topic. This happens in the form of HTTP requests, which are initiated by the hub, and use the registered callback URIs.
- Subscribers may or may not be online when the hub attempts to push updates. It is up to the hub to implement a strategy that probably will involve some retries, but also some condition under which failed delivery attempts of updates are considered as permanent failures.
- Generally speaking, this picture depends a lot on the reachability of the consumers. In order to make this more manageable over time, the assumption is that subscriptions are only valid for a certain time, and then have to be renewed. This allows hubs to manage subscriptions in a more efficient manner.
One aspect that is not covered by the PubSub specification is the actual media types of topics and pushed update messages. Historically, PuSH started with specifically supporting Atom, so the assumption was that both topics and push messages would be Atom feeds.
(The nice side-effect of this was that by using "Atom Tombstones", PuSH was a nicely simple sync protocol, covering the creation, modification, and deletion of topic entries.)
PuSH then moved away from supporting a specific media type, and the same is true for PubSub. It simply says that the media type of the published topic and the media type of the push messages must be the same.
In its first draft, PubSub is a bit vague about how exactly topic updates and push messages are related, as it allows for "diffing" without saying what this is supposed to mean. It also is not entirely clear how "fat push" and "thin push" are covered, since in theory it would be possible to have hubs that translate between these two kinds either way (possibly depending on the preferences of subscribers).
Being a first public working draft, PubSub still needs a bit of work to be as well-defined and stable as it should be for a Web-scale protocol. If you are interested in Web-scale and decentralized PubSub mechanisms, now is a good time to look at "W3C PubSub" and maybe even engage in the ongoing community discussion.
Comments