starting from Finding Stimulus Feeds and figuring out that this requires Finding Stimulus Sites, we are now back to square one, so to speak, trying to just figure out how to start the search for sites, so that we can find the feeds. recovery.gov has a list of available recovery sites, but it is not machine-readable and suspiciously short.
we are currently trying to compile our own list of eligible agencies, which should, according to the guidelines, set up recovery sites at agency.gov/recovery
. it would be great to get some expert opinions on which agencies are actually eligible and thus would have to set up web site and feeds, if they decide to spend stimulus money.
ironically, there also is no machine-readable list of government agencies we could find. usa.gov has an A-Z Index of U.S. Government Departments and Agencies
, but there only is a choice of paged HTML or PDF. we assumed that this would be a good list to start with, so we scraped that list and turned it into XML (and here is an HTML version of it). this list contains 587 agencies and their home pages as listed on usa.gov; most of them are .gov
sites, but there are also a number of others, such as .mil
, .us
, and .net
.
the next step will be to ping all of them for recovery pages (at agency.gov/recovery
) and then see whether they return something. but before we start doing that, it would be great to get some expert feedback on whether these 587 agencies are what we really should focus on, or whether the recovery act guidelines apply to a different set of agencies that includes some we haven't in our dataset.
Comments