after blogging about Stimulus Feeds Done Right, i received a couple of comments saying that the post was mostly complaining about the Initial Implementing Guidance for the American Recovery and Reinvestment Act
guidelines, but not really saying what to improve specifically. well said, i had to admit, so here is a more concrete approach at how to do stimulus feeds right.
- Finding Stimulus Feeds is important, and instead of setting up a central registry (which still might be a good idea), it should be possible to discover stimulus feeds in a predictable way. page 58 requires all agencies to set up
recovery
web pages athttp://agency.gov/recovery, and there are a number of requirements for these pages. feed autodiscovery should be added as a requirement for this web page, so that going to a agency'shttp://agency.gov/recoverypage and searching for a linked feed is a reliable method for discovering an agency's stimulus feed. - the guidelines seem to require three feeds, the major communications feed, the formula block grant allocation feed, and the weekly report feed. it is not quite clear, though, whether agencies are free to have just one feed carrying three types of entries, or whether three separate feeds are required. from an information dissemination point of view, it would probably be better to require three separate feeds (and maybe provide a fourth one aggregating all three). in that case, feed discovery would have to be more specific and make sure that all three feeds can be discovered from an agency's recovery web page.
- a feed can contain the information (in what format? more on that later) or can point to a file containing the information (in what format? more on that later). if the feed points to a file, how is it supposed to do this? it could do so via
atom:link/@src,atom:content/@src,atom:content[@type='xhtml']//html:a/@href, or, god forbid,atom:content[@type='html']//html:a/@href. ideally, links should not be allowed at all, but if they are, they should be required to useatom:content/@src. - agencies are allowed to provide no feeds at all, they can just publish the files via predefined URI structures. however, since it is unclear if and how a web server will make the directory of such a set of files available, it will be impossible to reliably discover and retrieve files with such an approach. at the very least, agencies incapable of producing feeds should be required to provide machine-readable directories on all levels of all posted files at the suggested URIs. there should be a required and machine-readable format for these directories.
- the guidelines specify the feed format as
preferred: Atom 1.0, acceptable: RSS
, and do not even mention RSS versions. this means that feeds can use 10 different formats, Atom plus the 9 different RSS variants. while it would be best to only allow Atom, it would be good to at least limit RSS to specific versions, such as RSS 2.0 (which probably should be specified to be RSS 2.01 rev 2, according to mark pilgrim's versioning). - since the feeds are allowed to contain only links instead of the actual data (essentially turning them into a notification mechanism), there also is a file format for how to publish the information. unfortunately, the templates for this file format are not publicly available. it would be important for this file format to be easily accessible with generally available tools, which limits the choices to plain text and XML. if XML is used, it should be plain XML and not some complex format. if more complex formats such as XBRL are required, XSLT transforms for up- and down-translation between these more complex formats and a plain XML format should be provided.
- starting on page 55, the guidelines mention data elements the feeds should include. it is unclear how these should be included (if the feed points to a file, the file format probably contains those data elements, but if the feed is supposed to contain those data elements, there must be a feed-oriented syntax for them). in addition to that, the datatypes seem to be copied straight from a SQL database scheme. there should be one required syntax for these data elements, and it should be based on XSD datatypes. implementation variants for this syntax are plain XML or microformats; if microformats are chosen, RDFa should be used.
this is not all there is to it (and without seeing the templates for the files it is hard to make specific recommendations for that part), but it should point in the right direction. i really hope that the current suboptimal guidelines will not make people believe that feeds necessarily result in a rather unpredictable hodgepodge of how information dissemination is organized. it actually does not take that much effort to come up with guidelines that create a robust and predictable landscape, and providing test data, test tools, and validators would help agencies to conform to those guidelines.
the good news is that the current guidelines (published 2/18/2009) specifically mention that more detailed guidance will be published within 30-60 days. which means that some of the above issues may still show up in the final guidance. this also means that it makes sense for agencies to wait until the more detailed guidelines become available.