one of the great things about web architecture is that it talks about loose coupling and how to set up an environment that focuses on cooperation, rather than integration. this is an essential difference, because integration always implies to give up specifics for the benefit of bringing things together, whereas cooperation focuses on keeping things separate, and to only harmonize things that need to be harmonized (RESTfully speaking, the representations of the resources that are required as foundation for cooperation).
our work on stimulus feeds and in particular our recently published report on how to improve the initial stimulus feeds guidance was an interesting opportunity to think about the specifics of implementing such a feed-oriented architecture in a scenario as big and heterogeneous as the U.S. agencies.
it is unlikely that all agencies are willing or able to set up feeds that not only carry reports in a useful machine-readable representation (most likely XML), but also adhere to additional requirements such as supporting feed paging and archiving, using HTTPS to provide some authenticity, and maybe should even have (if this really will be the official channel to do recovery reporting) digital signatures in the authoritative XML version of the data, so that authenticity and origin can be securely verified.
if the OMB is serious with the underlying idea of the
recovery.gov (and maybe even beyond that; but
recovery.gov would be an ideal test case and platform for deploying and demonstrating innovation in open government approaches), this could mean to provide a Federal Feed Cloud. here are the services i envision to be provided by this feed cloud:
- secure and reliable hosting: following the cloud meme, such a feed cloud should be secure and reliable, so that data and services provided by that cloud can be used safely and reliably. whether this cloud should be based on commercial cloud services or implemented on IT hardware of the federal government is a tricky and far-reaching question.
- forms for human data entry: using simple web forms, all data required to be published in feeds can be entered in forms and then can be published in a feed by simply submitting the form.
- AtomPub for machine-oriented data entry: for publishing entries in feed through a web service, AtomPub should be used. it is RESTful and custom-designed to provide write access to collections which are published through feeds. Atom-Pub submission should only accept XML and validate all submissions before accepting them.
- transformations for human-readable feeds: even though the primary data format is XML, every feed has transformations (probably using XSLT) attached to it, so that data published in that feed is transformed to HTML, and thus the feed carries human-readable content.
- cloud queries: all the information hosted by the cloud can be consumed using feeds, but it should also be possible to query the cloud, either by querying individual feeds, or by querying sets of feeds. in this scenario, the cloud plays the role of a data warehouse, and feed queries are the language to access it.
i am sure such a Federal Feed Cloud will not become reality tomorrow, but i think it would be the right architectural direction. if agencies don't need it, they don't have to use it at all. but agencies with limited resources for IT development could use such a cloud to easily and rapidly publish their data in a loosely coupled way.