The recently published HTML 5 draft does not change anything regarding HTML fragment identifiers. They are still limited to IDs only (with <a name="">
as alternative for backwards-compatibility). This means that any reference into an HTML page depends on how the page is using IDs.
But wouldn't HTML 5 be a wonderful opportunity to bring a little bit more hypermedia back to the Web? XML had XLink and XPointer. Both were failures for a number of reasons, but I am still a big fan of trying to make the Web more hypermedia-like. So why not learn from XPointer and try to give HTML 5 a more practical and useful set of fragment identification methods than just IDs?
The whole fragment identification idea is a classic chicken and egg problem. Why use them when they're not supported? Why support them when they're not used? We had a lot remarks like that when we worked on fragment identifiers for plain text files, but I still believe it is good to have mechanisms like that. Assume Firefox had a feature where you just moused over a paragraph, right-clicked, and then you could send an email with a pointer to that paragraph. If the receiver had Firefox, the browser would scroll to and highlight that paragraph. I am still convinced a lot of people would find such a feature pretty useful. And things would not break in another browser, users would simply not get the scroll/highlight behavior.
While I am convinced that HTML 5 would be the right point in time to introduce such an improved fragment identification method and try to fix the fact that few people use HTML fragment identification, I am not really sure how to best do it. My guess is there should be three basic ways of identifying fragments:
- IDs: For backwards compatibility, IDs (and <a name="">) should be supported. It would be what XPointer called barenames or shorthands.
- Child Sequences: Similar to XPointer's child sequence, there should be one in HTML 5, which could either start at the page body, or at an ID. The fragment identifier
#warning/2/3
would identify the third child of the second child of theid=warning
element. - Character Pointers: Should there also be a way of how to point to a position? Maybe defined by counting characters in the page's string value? Hard to tell, but this is where XPointer definitely went over the top and was never finished, because it even tried to define arbitrary ranges, which is really hard to do.
Maybe just IDs and child sequences could do the trick? There also should be a well-defined behavior for browsers, so that a user instructing a browser to create a fragment identifier could be sure that it will always be rooted at the nearest ID, to make it less likely to break. I am sure there are many more details to figure out, but I am curious whether anybody else thinks this could become a pretty useful addition to how HTML can be used.
And please don't even ask about how to handle situations where CSS is hiding parts of the document, maybe dynamically, or even worse, where scripting code is changing the document's DOM. It would be necessary to have well-defined behavior for all possible situations, but my guess is that for the majority of static Web pages, fragment identification in a rather simple form would already be pretty useful as a way to better communicate about Web content.
I would be really interested whether this is just another of those ideas that kind of feel right, but where a lot of people think it is not going to work or not worth the effort, or whether this could actually work. I would certainly love to see the Web becoming a better hypermedia system.
[[ this is the first post i am writing for the O'Reilly XML Blog on xml.com, here is the full post on xml.com. i'll keep posting my xml.com posts to my personal blog, just to have everything in one place. but if you are looking for comments on an xml.com post, it is probably a better idea to look at the xml.com post, which is why i will always include a note from my blog pointing to xml.com. like this one here:
also published on xml.com as http://www.oreillynet.com/xml/blog/2008/05/xhtml_fragment_identifiers.html ]]
Hi Erik, This seems worth directly discussing with those participating in work on the HTML5 draft. I encourage you to join the HTML WG and start a discussion about on the public-html mailing list, or/and to post a message about it to the whatwg discussion list. --Mike
Posted by: Michael(tm) Smith | Sunday, May 18, 2008 at 23:58
To echo Mike's comments, please don't hesitate to bring this kind of thing up on the mailing lists (either the WHATWG one or the HTMLWG -- see http://whatwg.org/mailing-list for details on joining the WHATWG list, the HTMLWG list is a bit harder to join).
The biggest problem I can see is reliability. I don't really see any way to make XPointer-like solutions resilient enough to work on the Web. That, and backwards-compatibility issues, are really the only things blocking progress here.
Posted by: Ian Hickson | Tuesday, May 20, 2008 at 19:55
mike & ian: thanks for your comments! i am currently in the process of joining the HTMLWG list, but currently the systems seems to be thoroughly confused by the fact that a while ago i already was a member of another WG. but i hope to get access soon and will then raise this issue on the list.
of course there are resilience issues, but i think the utility of being able to point to arbitrary fragments on an HTML page would offset the risk of these identifiers breaking when the page changes. the spec could actively try to minimize breakage by encouraging page authors to use ids for important structural parts, and by defining how browsers should construct fragments. anyway, we'll see what others think about that...
Posted by: dret | Tuesday, May 20, 2008 at 22:26
Hi Erik,
I caught your post on the public-html mailing list and wanted to let you know about my Firefox extension that does what you're trying to do with existing HTML web pages today using XPointer. There is also a user.js for Opera that supports this.
This is just in case you feel like playing with it.
Regards,
Jeff Schiller
Posted by: Jeff Schiller | Thursday, June 12, 2008 at 04:45
http://www.dlib.org/dlib/july00/wilensky/07wilensky.html
makes the case that robust pointers need more than fragment identifiers.
Posted by: Larry Masinter | Sunday, June 15, 2008 at 19:28
larry, thanks for the link! and i certainly did not want to imply that my (premature and quick and dirty) initial proposal for better fragment identifiers produces robust fragment identifiers. that would be a much more challenging task. all i wanted to propose was that html should mature to a point where fragment identifiers can identify things beyond the current @id-only definition for them.
the article states that "the web develops as an increasingly sophisticated hypertext platform", and i think that (sadly) this not really true. at best, it becomes a platform for hypertext islands (at least for those web sites which actually care to produce hypertext), but more and more authors go down the "why link when people can search" route, instead of choosing the "if you link, people don't have to search" route. and maybe putting google in charge of html5 is not really the best way to try to improve html as a hypertext format. what happens is that it is being mostly improved as a publishing platform for interactive documents and services - which is fine. but i think the h in html still should be taken seriously.
anyway, there are many ways in which fragment identifiers could be improved for html5 (in html4, they basically are as bad as it can get), and i think the important first step is to understand that html5 hopefully should not just incorporate the most popular things people have solved with scripting over the past years. it should also seize the opportunity to improve the web as a hypermedia platform. after all, that is why it's called the web.
Posted by: | Sunday, June 15, 2008 at 20:24
This has another value. It is arguably a better avenue than RDFa for bridging the GGG to the WWW, ie. RDF to the web.
RDFa is antithetical to RDF's essentially 'addititve', always open, graph engineering principle -- to make a predicate you must change the subject. But a good fragment scheme would allow RDF URIs to reference the web more effectively, enabling statements/triples to be added freely and separately.
On robustness, there is a natural temptation to ponder it, but think of basic URLs: they have none, yet the web has succeeded greatly. Totally disregarding robustness would be well founded, even well advised. Not trying too hard makes the web work.
I lean toward just child sequences. It lacks precision in the face of Proustian paragraphs, but would be mostly sufficient. And it grasps a range, which seems essential to the concept.
Posted by: Harrison Ainsworth | Friday, April 10, 2009 at 09:47
Well, no doubt - HTML 5 is a real new and promising thing. It really shows some advantages over flash and java in some ways - hope that it will get its place in web 2.0 and web 3.0 soon.
Posted by: commenting system | Sunday, August 22, 2010 at 16:11