now that we have started Collecting Stimulus Feeds, we quickly realized that there are not so many available (a total of 25 feeds currently), which of course also greatly diminishes the ability to collect useful amounts of data via these feeds.
recovery.gov
has a much bigger selection of weekly reports on their weekly reports page, because they depend on the reports being sent by email to them.
we are primarily interested in getting to the point where the feeds are a dependable and robust way of getting Recovery Act spending information, but this is currently not the case. so we started scraping recovery.gov
's weekly report page in order to get all currently available report.
to better understand how reporting via feeds compares to reporting done by email (and then being published on recovery.gov
), we have dropped both datasets (the feed crawl and the reports scraped from recovery.gov
) into SIMILE Timeline, a user-friendly package for visualizing time-based data series. the result not only looks pretty nice, it also shows that feeds may be a bit faster, NSF's newest report is available via the feed, but not yet available via recovery.gov
. it also shows that the data is sometimes changed, a misdated ED report (2008-02-23) has been republished with a fixed date (2009-02-23) by recovery.gov
.
we now have all weekly reports in Excel and XML (which is extracted from the template-based Excel-files), but we have not yet made the XML publicly available. we will do this in the next days, and then republish all the weekly reports we have as feeds in a way similar to the method we used for the demo site we produced for our proposed guidelines: as HTML embedded in the feed, linking to an XML version of the data.
Comments