It is interesting to ask the question how the Web would look if it
would be reinvented today. And of course this question would be
answered very differently by different communities, depending on their
focus and interest. Since Google
is such a big player on the Web these day, and increasingly shapes the
way in which the Web is perceived and develops, it is particularly
interesting to think about what The Web According To Google would look like, and what that would mean.
While The Web
means very different things to different
communities these days (the three biggest communities with rather
distinct world-views are the RDF-flavored Semantic Web, Ajaxy Rich Internet Applications, and SOAP-style Web Services), some of the main principles underlying these different perspectives remain the same. The Web at its very core is URIs, HTTP, and HTML; in a slightly extended perspective, CSS and JavaScript (in particular its XmlHttpRequest-based flavor of Ajax) also have become essential for many Web-based scenarios. Looking at these fundamental technologies, what would The Web According To Google look like?
- URIs are brittle and dangerous: URIs have proven to be the
source of many confusions and problems. People have a hard time
remembering URIs and often forget them, domain names have become a
market of their own with many problems lurking in the areas of
domain-name-based security concepts, and a Unicode-enabled DNS, and
with resources generally becoming inaccessible due to removed resources
or server reorganizations. So at the very least, the address bar in a
browser has to disappear, so that this source of errors and
insecurities is safely eliminated. A search box is sufficient to find
any resource on the Web, and URIs have introduced more problems than
they have provided value. Instead of URIs, the Web should simply be
organized as one huge distributed hash table, for which efficient
implementations are known and tested, and which generally avoids the
problems and inefficiencies of crawling and deep crawling. Servers
publishing pages have to insert them into the table, and when resources
are deleted, they are removed from the table.
- Search-based Navigation: As mentioned above, URIs and
URI-related security and usability issues have long caused problems on
the Web. Instead of trusting a Web page to link to the correct site (it
could always link to a phishing site), it is much safer to search
instead of navigate, and to relate on the wisdom of the crowds when it
comes to identifying and avoiding phishing and other security risks.
Navigation should thus be restricted to within-domain navigation, and
any navigation outside of a site's domain should be done by search, or
should at least be supplemented by search, so that the search results
could be used to alert uses of unintended consequences when following
links to malicious sites.
- Information Brokers: The ideal Web has people interacting
with information brokers constantly, either explicitly by searching or
following ad links, or implicitly by tracking them through cookies
across the vast number of pages somehow affiliated with those brokers.
The more brokers know what users are doing, the better they can serve
them information that's valuable for them. So instead of having to go
through the security and deployment pains of having to deal with
cookies and having to install search bars on millions of computers, the
Web should have a built-in mechanism of how people can be tracked.
Maybe cookies should be moved into HTTP/HTML itself and browsers should
stop being so picky about all this privacy stuff. After all,
standardization's most noble job is to codify common practices, and if
there is one really widespread common practice on the Web, it's user
tracking.
- Finding Content: Instead of trying to make HTML a better
hypermedia language (so that people can easier navigate the Web, for
example by using more specific links to document fragments), it makes
much more sense to focus on making HTML a better document format.
Linking and navigation is inherently dangerous and prone to link rot
and organizational difficulties of large information repositories. So
the biggest improvement of HTML would be to simplify its structure so
that it can be more reliably indexed, so that users can enjoy better
search services. Eventually, it should be considered to get rid of the
misleading
HT
prefix and merge HTML and PDF into PDML, so that content can be easily published, indexed, and printed. why would anybody want to have structured documents when everything is indexed and can be searched and found?
- Applications should be Web-based: Desktop-based applications
are a thing of the past, and their slow turnaround time for patches
introduces another large security problem. Web-based applications are
more agile, are available on demand, and can be used to transform the
Web not only into an information cloud, but also into an application
cloud. Browsers need a bit more functionality to make this a reality.
One example for this is the ability to use local storage for offline work
and data access optimization. Google Gears (built into Chrome)
is one first step for this and allows applications to persist data. The
second step required is for browsers to persist applications, so that
offline mode can also be used when the application has to be loaded.
It's not too unlikely that one of the future Chrome versions will
provide exactly that feature. After this, all applications can be moved
to the Web, all their data can be moved to the Web as well, and this
will allow information brokers to provide even better recommendation
services.
While this may be a bit of a caricature, it is not that far
away from what's happening. Many people seem to have given up on the
idea of the Web as something that should be decentralized and open to
exploration. The Web never was a sophisticated hypermedia system, but its
rudimentary and basic hypermedia features proved to be extremely powerful. It turns out that the simplicity and fault-tolerant design
of the Web were the killer features, and not the much more advanced hypermedia
features researchers had worked on for decades.
The main point of these considerations is not to say that Google is
doing a bad job at contributing to the standards process. The main
point is to say that Google's contributions are influenced by its
perspective of the Web, and by its business model. After all, Google is
a spectacularly profitable for-profit company and very likely tries to
maintain that status. Google's interest in the Web is a business
interest, and its contributions to the development of the Web are
shaped by this fundamental perspective. Google's Web is a centralized
Web with Google in the middle and everything else is revolving around that
central hub. And even though I try to minimize my Google
time, the number of times per day I interact with Google (Search, AdSense,
DoubleClick, iPhone Map App, Google Reader, ...) must be very
impressive.
As a final note on how these things already start showing up in reality: The W3C's Geolocation group is currently specifying a geolocation API, and even though it has not been heavily publicized, this is a very significant activity because it it the first time the W3C ventures into the area of Location-Based Services (LBS), an area with a huge potential for growth. One of the contentious and very important questions is that of privacy: how much control do users have over their location information when they (or their device on their behalf) interact with Web-based services. One of the reasons why the current API has very weak privacy considerations are the claims that (a) these could not be enforced anyway, and (b) by setting too restrictive privacy policies, users might break service functionality. The interesting question in the context of these seemingly technical terms is: what is service functionality
when you're Google? If your service is to serve ads based on behavioral and location profiles, then anything preventing you from gathering behavioral and location information means breaking service functionality
. It is this view of the Web that already starts shaping the way in which the Web will work in the next couple of years.