we just updated our scraped data for stimulus reports, this comparison page shows the reports we scraped from recovery.gov as well as the reports we got from agency sites directly. we now have 416 reports we found on recovery.gov, and 92 reports we found at agency sites. but please keep in mind that finding agency feeds is not all that easy, so we might have missed some agency feeds that recently became available. it is also important to keep in mind that even though the february as well as the april OMB reporting guidelines specify that reporting has to be done via feeds, nobody seems to enforce this, and OMB itself is only accepting report from agencies via email, which kind of defeats the purpose of having feeds for the agencies.
it also is interesting to point out that there is a new template out there, version 1.4. we usually find out about this because our code crashes; some new template is encountered, and then we have to reverse-engineer the new structure and add this to the scraping/parsing/transformation process. the interesting thing about the new template is that it is the first time that there's some location data in the reporting, in this case in the form of a state code (here are examples in HTML and in XML, as well as the original Excel). this is probably not the granularity where things get exciting, but it is definitely a step in the right direction: get your map mashups ready!
with new guidance expected next week (and not this week as i originally thought), i am sure all of this will change again. and playing whack-an-excel with the changing data gets less exciting, now that we have a toolset in place. but this also shows how hard it still is to follow the data flow about stimulus spending, if you really want to have all the data (and ideally from the source, the agency feeds). publishing HTML/Excel on the web is only marginally open and not transparent at all, and even the recently added feeds do not help all that much, because they are neither paged nor archived. (well, our feeds are not paged as well, but at least they're complete, and we don't run on a $84 million budget.)
so we are definitely looking forward to next week's updated stimulus reporting guidance, and to a certain extent, we can probably judge the quality of the new guidance by how much easier it will make it for us to do what we have been doing so far. after all, all we are trying to do is to keep track of all of the reporting that is published somewhere, and so far, this job has required much more reverse-engineering and continued updating of code than it should.
Comments