« Atom Feed Autodiscovery | Main | Link Relation Registry »

Thursday, February 26, 2009

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

George Thomas

Hi Erik, great to find folks like you talking about what recovery.gov is trying to do with feeds. I'd like to hear more about your feed autodiscovery ideas. As it is, doesn't a simple URI convention to GET an Atom service document help get us where we want to go, even without a 'registry'? Having said that, the combination of 'guidance' and recovery.gov could perform that function, even if we can't autodiscover feeds, but at least know where to introspect.

All of the 'use RSS or Atom' and 'can contain or point to' is more an acknowledgment of what we expect or don't expect reporting entities to do now. We know there are lots of versions of RSS, but most folks seem to have more awareness of RSS than Atom, even if they have no idea what RSS flavor they're using.

Based on initial and preliminary feedback, usually not relating to the actual techniques and technologies being suggested like yours, shows that some, if not most, would seem to want or expect recovery.gov to do all the work, and be the only APP server. Others are perhaps more ready to provide their own capabilities, and ask the right questions, like you're doing, which helps us gain clarity. Ideally, the original vision was to enable 'transparency at the source', and consider the information providers the gold source of that data, so the less recovery.gov processing the better, for a variety of reasons. We may have to take the fork in the road and do both, or at least provide everything anyone would need initially. We also hope to be able to share open source reference implementations of anything we do stand up on recover.gov.

For the former 'do it all for me because commodity standards based web infrastructure and skills is too hard or redundant across gov and therefore bad' crowd, the idea thus far is to bind a spreadsheet and/or and XForm to XSD datatypes, such that they either put/post to their own or recovery.gov provided APP services, that would parse/transform/publish in whatever way is most expedient. Office productivity spreadsheets and XForms can be edited offline and published to such service, or XForms can be served online as well, submitting directly to the parse/publish/persist/view/whatever services.

For the later 'we've got the chops and the infrastructure, just tell us how you like it' folks, we're heading toward a desired XHTML+RFDa set of markup standards, that would allow us to consider the web page as the web service (consumable by humans with browsers and parsers creating triples), the published resource as the public record, and the entries of the feed resource as recordest state changes, suggesting a feed organization around what we're actually tracking (grants/loans/contract awards) from various reporting entities receiving stimulus funds (fed/state/local/tribal gov agencies and large/medium/small businesses!) throughout a stabilization/stimulus/recovery/growth lifecycle with cost/schedule/performance indicators, over a feed organization around milestone/lifecycle centric reports, which seems to be consistent with what you are suggesting, however there's a good bit of information architecture to get worked out here yet.

We'd like the XML based on the XSD's provided to be be transformed to reflect an overall graph based (RDFS probably) data model (early in development) that begins with a large number of existing systems that we'd like to ultimately make Linked Open Data enabled SPARQL endpoints to ease correlation across disparately owned/operated/managed graphs (other LOD/SPARQL exposed DB's) as they also represent a large number of relational schemas that must be integrated but are unlikely to be normalized without extreme coordination cost - then there's lots of existing domain taxonomies/ontologies (like XBRL with FM concepts/terms tags, and I think SIOC is the kind of data model we want that tracks the dollar instead of the person across disparate web sites) that may be useful in building a federated graph-bridge capability without instantiating 'one ring to rule them all' (yet another) database, which is what most seem to assume or recommend that we do.

I hope this helps give more ideas about what we're thinking - as you might imagine the stakeholder set is extraordinarily large, so it's not always easy to progress in an 'agile' fashion. Feel free to reach out to me george at thomas dot name as you like, I'm sure I could learn from you, and would be grateful for that.

dret

@george: i just recovered your comment from the spam folder, thanks for pointing me to it, and i am sorry it ended up in there.

i think the general idea to try to stay away from "one ring to rule them all" centralized architectures is very important. the other important question is whether the overall data model should be based on RDFS or plain XML. i would argue that outside of the semantic web community, tools and know-how for working with RDF (let alone SPARQL) are not very widespread, a point bob glushko and i tried to make in our "XML Fever" http://dret.net/netdret/docs/wilde-cacm2008-xml-fever article. personally, i like the picture you're painting here, minus the RDFS/SPARQL part. i would argue that if it can be done with plain XML, it should be done with plain XML. if it requires sophisticated ontologies and advanced reasoning, then it might require RDFS/OWL, but that would be a decision that should not be taken lightly.

The comments to this entry are closed.

Flickr