[This article was originally posted as part of the Rising Star Dream Team Future of Search series on the VortexDNA blog. I'm deeply grateful to Kaila Colbin, VortexDNA's resident blogger, for the opportunity to participate!]
I've been writing about Search technologies for a while now, so when
Kaila Colbin of VortexDNA offered me the chance to participate in a future-of-search marathon, I jumped at the chance!
What
will the search for information look like in the future - in five
years, ten, twenty? Is it just more of the same, or will it look
radically
different?
Looking Back
Before looking to
the future, let us first look at how far we have come. Danny Sullivan
has a great post looking at a decade of search history and the various tribulations of
past and present search engines - AltaVista, Ask Jeeves, Microsoft, Yahoo! and of
course, the early Google. We owe a huge debt of gratitude to the
tremendous contributions of these and other early pioneers of Search;
Google, in particular, deserves a great deal of the credit for making
web search ubiquitous outside the tech community. Indeed, "to Google"
as a verb has become virtually synonymous with the idea of Web Search,
much as the Xerox brand became synonymous with the idea of the
photocopier in a bygone era.
Google's venerable PageRank
algorithm is certainly best-of-breed for the present, and Google keeps tweaking its results continuously. Given this
progress in the Search area, can we still expect to see major
improvements in search in the forseeable future? As an analogy,
consider the DC-3 airplane
- the first
truly modern airliner, it was powerful, safe, reliable and economical
(indeed, some of these are still flying today). It revolutionized air
travel, and with its
introduction, many considered the aviation age to have arrived for the general public. And
yet, early jet aircraft had already appeared on the horizon, so to speak; within a
decade, this reliable workhorse was obsolete, overtaken by jet aircraft
in the competition for public air travel.
It could easily be the same with
search. The key question that a search engine addresses is: what results do the maximum number of
users find most useful for a given search query? PageRank is simply an
approximation of the Wisdom of Crowds to answer this question. Is there
a richer abstraction? Is Engagement the new black? Whatever the new
approach is - in order to provide accurate results, it must work as implicitly as possible.
We have only to envision the possibilities ...
Looking Forward
So let us take a speculative look at search, circa 2015 . To look at it
systematically, we can separate the search engine into the following
components from a user perspective: (To see this breakdown in visual format, check out my earlier post on an abstract architecture for search )
- Query specification
- Base Index
- Relevance Algorithm
- Results Visualization
- Ongoing Interest
Let us consider the possible future directions for each one in turn.
Future Directions
Query Specification (Input)
Google
pioneered the keyword-centric, minimalist approach for specifying the
search query, and all the major search engines follow that lead. But
the search criteria could be so much richer
; instead of experimenting with different types of keyword searches to
find the information they need, users could simply provide additional
criteria up front to qualify their request.
Admittedly, this approach does not work for everyone. The casual user would get reasonable defaults, which would automatically
get updated with regular use to their favorite values; the topical
researcher, on the other hand, would actively tinker with these widgets
in a "power user" mode. (Google already supports this type of
functionality in a limited fashion.)
Some possible advanced features for specifying the query, are given below:
- Content Spec:
Enabling the user to dynamically specify the data sources to be
included, based on domain, reputation, social network, and so on
- Scope: Input for seamlessly limiting the scope of the search, to Enterprise or personal data
- Qualifiers:
Allow the user to add more information to disambiguate result matches,
e.g. qualifying if "Java" means the programming language or the island
- Parameter ranges:
Domain-specific parameters can be extremely valuable even to a
general-purpose engine (see #5 in the section on Relevance Algorithms
below)
- UI paradigms: Text
keywords are a limited form of input. The actual input mechanisms could
be more visual, in the form of sliders, buttons, fields and other UI
widgets. Imagine, for example, that as you move a slider, the search
results change or an increasing number of results appear on the page!
- Multiple Profiles:
Personalization does not always have to be implicit. A user could
explicitly set up profiles to represent different interests -
professional, hobby, personal and so on, so that switching the profile
would quickly change the areas of interest
Base Index (Content)
This is a core area of concern for search engines: what is the scope of content to be considered when searching for information?
The
standard approach currently is to build web crawlers that continuously
scan as many web sites and web pages as possible; the scanned content
is then used to build a master content index that is then updated
regularly. This index is then used as the basis when searching for
information.
For the base index, the big changes in the future are likely to involve both the scope and understanding of the content; here is a short list:
- Rich media search, e.g. true indexing of audio and video content
- Dynamic content search (searching the invisible web )
- Integration of personal, web and corporate information
- Perspective-based search, e.g. conservative vs liberal, hard news vs opinion, and so on
- Subset creation, on-the-fly, e.g. to search for domain-specific data
Relevance Algorithm (Mechanism)
This
is, of course, the most-debated topic when discussing the future of
search engines. Clearly, many different approaches and technologies
show promise; some of these are noted below:
- Personalization (but without storing personal info )
- Social Input / Wisdom-of-Crowds (which has its pitfalls )
- Social Graph: where your selected network of people help improve search results (Robert Scoble has recently gotten religion about this concept; Danny Sullivan rebuts )
- Semantic Processing: of both, the query AND the content (will this let the Search Engine find answers that we never knew we had?)
- Parametric Search: Vertical
search engines already routinely offer domain-specific parametric
search; for example, job search engines allow the user to specify the
all-important location of the job as a primary criterion. Can
this type of feature be generalized, so that as a user drills down
deeper into search results, an increasing number of parameters can be
offered?
- Human-powered Search, for either the short head or the long tail of search
- Swarm Intelligence: Mimicking biological search, such as Ant colony optimization, particle swarm optimization, and so on
Results Visualization (Output)
Again,
Google leads the way with its minimalist approach: simple headings,
links and snippets of text. This is slowly changing, with the new "Universal Search" approach from Microsoft, Google and others; Ask.com is a leader in this area.
Search
engines of the future will likely implement completely new paradigms
for users to navigate and view search results. Often, meta-results -
representing information about the results - are as important
as the results themselves: users can figure out where a given result
fits into the overall universe of results, and find the related results
to an item of interest that has been found.
Some possibilities for results display are given below:
- Tag clusters is not a new concept, but has yet to gain traction among the majors. Quintura, with its dynamic tag cloud display, has one of the best examples.
- Organize results information by content type,
is something every search engine will have to think about in the
future. For example, should news stories be presented in an "overview
capsule" fashion, or organized as a timeline-based view? Dale Dougherty
at O'Reilly Radar has a brilliant article on this topic: Journalism is burning.
- Follow-up actions - on viewing search engine results, a very common user action (as Greg Linden points out
) is to modify the current query, either to drill-down further or to
try a different approach to find the required information. Google's "did you mean ..." feature is a step in this direction (although it leaves much to be desired ).
- Domain-specific visualization
can significantly enhance the understanding of results. This is similar
to the data organization point above, but focused on the display itself;
results from different vertical domains may require very different
visualization techniques, such as colors,
graphs, images, trend lines, heat maps, topographic charts, and so on.
[For a list of the more exotic variations, check out this amazing list from Smashing Magazine.]
- Dynamic scoping
- enabling users to widen or narrow search results, based on different
criteria - such as geography (local or global), site authority,
timeliness, point of view, domain, and so on - is a powerful feature,
that will continue to grow in importance.
Ongoing Interest (Notification)
This
can be best explained as a Reverse search, where it is the content that
finds the user - thus turning the concept of search on its head.
Most
of us have ongoing interests in certain areas; they could be
professional, social or personal. It makes a great deal of sense for
the search engine to keep track of these interests and pro-actively notify the user at some periodic interval of new items that fit those interests. Google Alerts
is an early example in this direction. But enhancements to its
functionality in the future could significantly boost its utility.
Some day, search engine notifications could support the following features:
- Diverse Mediums:
Many search engines already support email notifications. What's to stop
them from adding support for many additional delivery mechanisms, such
as IM, SMS, widgets, the twitter API, and so on?
- Levels of Detail: Allowing users to set the scope and organization of information presented.
- Prioritization:
This is a key feature! Once users are able to set priorities for
different types of searches and for different areas, this can be used
to drive the other features. For example, send me the headline about a
breaking news event directly relevant to my blog, as an instant message,
but email me a digest of the day's results for baseball scores.
- Schedules:
Some search results make sense only at certain times of the day; e.g.
traffic search results are only relevant at commute times on work days.
- Dynamic Control:
Finally, empowering users to assert dynamic, granular control over
their search alerts would make this functionality truly powerful. For
example, once I've been notified about a breaking news story, I might
want to artificially boost its priority and delivery method to
continually get updates quickly and efficiently.
Power and Responsibility
As
search engines start including a few or many of the features described
above, search will grow increasingly more powerful. It will get easier
to find any information we want, quickly and easily. Whether the
information is high-level or detailed, global or local, general or
specific, past or present, in any domain - no nugget of human knowledge
shall escape this relentless spotlight.
Is shining a light on
the darkest corners of the web always a good thing? As a webbed
superhero once told me (and a few billion others) - "With great power
comes great responsibility!". Privacy advocates are rightly concerned
with the growing power of global web search engines; ongoing efforts
from official and community channels are essential in minimizing abuse.
A related issue is that web content can be archived and searched in
perpetuity - the societal effects of this phenomenon have not yet begun
to be understood. A recent New York Times column highlighted this issue (paid content; here's a perspective on it from Slate magazine ).
Conclusion
Clearly,
search engines will continue to evolve, and a future engine might well
have many of the improvements described above, in the next ten years. But how about
even further out - say, 2020 or 2030? Will disruptive changes in networking,
computing and information technologies radically change the way search
engines operate? A change in the nature of human thinking, interaction
and social customs would be even more dramatic, and could cause a
change in the nature of search itself.
This is, of course, a
fertile area of speculation more in the realm of Science Fiction
(for now): for example, will we one day need a
galactic search engine? Can we create microscopic
information-matching agents, either biological or atomic? Results that suddenly become available to the
user as knowledge in the brain? An
"implicit" search engine that finds information as we need it? Why not?