« October 2007 | Main | December 2007 »

November 29, 2007

The Semantic Web is becoming real - slowly

A couple of weeks ago, I attended an event from the SDForum in Palo Alto, featuring a series of project demos showcasing real applications built on the Semantic Web. While I was initially skeptical, I came away amazed at the social and semantic intelligence being built into the latest web applications.


Yahoo!

The most interesting demos came from Dr. Mor Naaman of Yahoo! - these projects were at once the most real and the least relevant to Semantic Web (at least, in its pure form).

TagMaps


Described as "a toolkit to visualize text (tags) geographically on a map", TagMaps allows the creation of applications that mashup text and geographical information (such as Flickr images) with Yahoo! Maps; Yahoo!'s sample application World Explorer is quite amazing. The most interesting thing about this application is that by combining the geo-tagging information about Flickr images with their corresponding tags and then displaying those tags on a map, the application accurately displays items of interest on the map - this is semantic information that has been extracted from the underlying raw data.

ZoneTag



Zonetag can automatically tag your photos with geographical information; in addition, it can suggest tags for the photo based on the location . This makes it easy to tag photos taken on a cell phone with both types of information.

FireEagle


FireEagle, currently in closed alpha testing, is billed as "a new way to share your location with friends or with other websites and services". The main idea is to create a new user location platform that any third-party can leverage to read and write the location of the user.


Radar Networks

Any set of Semantic Web demos would be incomplete without an entry from Radar Networks. Nova Spivack, CEO of Radar, presented a demo of their offering, twine [tagline: "using information as context"], which is basically a new social network to which Semantic Web concepts have been applied. twine, currently in closed beta, has been getting a lot of press recently as the first true Semantic Web application.

I have to admit, the demo was quite impressive. Mr. Spivack created a new "twine", assigning a series of web pages, articles and other web information to the twine, and the application extracted a whole range of meaning from the content - automatically assigning tags about topics, people, links, locations, even concepts. It was a cool thing to watch!

While this exercise clearly demonstrated that the underlying technology works, and works well - clearly, great things lie ahead for the Semantic Web - I was less than impressed by the actual application chosen by Radar Networks (maybe I just don't see it yet). Does the world really need another custom home page or social networking application, even one that harnesses the Semantic Web?


SRI

Adam Cheyer from SRI presented a demo of an experimental project named CALO. CALO, which stands for Cognitive Assistant Learning and Organizing, is a DARPA-funded project that gathers the user's context and supports dynamic decision-making. In effect, the "software assistant" watches everything you do to learn, so that it can eventually make intelligent suggestions, for example, act as a search assistant or suggest alternate knowledge users for a meeting. A parallel project, CALO Express, is a productized Windows version for commercial use.

An intelligent software assistant is a noble goal, but watching the slides, I wondered if it would get traction commercially - the idea of this virtual assistant watching everything I do was slightly creepy; it's probably a better fit for a more controlled world, such as a defense lab or that perennial Hollywood favorite, a "top-secret government project".


PARC

The folks from the legendary Xerox PARC demonstrated Magitti, a "mobile leisure guide". By implicitly collecting information about the user's behavior within their mobile device, the application learns about your interests within a given context; this is then used to guide the user by suggesting other activities by location, time of day and social peer behavior. Again, a good idea, perfect for today's Facebook-fed generation.


Semantic Web or Privacy: Pick one!

The demos were all very cool and worked flawlessly - it is amazing how much meaning can be gleaned by an application by combining data about geography, time, context and peer groups. At the same time, it requires participants to willingly share information in order to avail of the benefits of semantic processing. Is it a good trade-off, one that users are willing to accept? That remains to be seen. As the early commercial applications of Semantic Web become widespread and more easily available, the answer is likely to become increasingly obvious.



November 04, 2007

Future Directions in Search

[This article was originally posted as part of the Rising Star Dream Team Future of Search series on the VortexDNA blog. I'm deeply grateful to Kaila Colbin, VortexDNA's resident blogger, for the opportunity to participate!]


I've been writing about Search technologies for a while now, so when Kaila Colbin of VortexDNA offered me the chance to participate in a future-of-search marathon, I jumped at the chance!

What will the search for information look like in the future - in five years, ten, twenty? Is it just more of the same, or will it look radically different?

Looking Back

Before looking to the future, let us first look at how far we have come. Danny Sullivan has a great post looking at a decade of search history and the various tribulations of past and present search engines - AltaVista, Ask Jeeves, Microsoft, Yahoo! and of course, the early Google. We owe a huge debt of gratitude to the tremendous contributions of these and other early pioneers of Search; Google, in particular, deserves a great deal of the credit for making web search ubiquitous outside the tech community. Indeed, "to Google" as a verb has become virtually synonymous with the idea of Web Search, much as the Xerox brand became synonymous with the idea of the photocopier in a bygone era.

Google's venerable PageRank algorithm is certainly best-of-breed for the present, and Google keeps tweaking its results continuously. Given this progress in the Search area, can we still expect to see major improvements in search in the forseeable future? As an analogy, consider the DC-3 airplane - the first truly modern airliner, it was powerful, safe, reliable and economical (indeed, some of these are still flying today). It revolutionized air travel, and with its introduction, many considered the aviation age to have arrived for the general public. And yet, early jet aircraft had already appeared on the horizon, so to speak; within a decade, this reliable workhorse was obsolete, overtaken by jet aircraft in the competition for public air travel.

It could easily be the same with search. The key question that a search engine addresses is: what results do the maximum number of users find most useful for a given search query? PageRank is simply an approximation of the Wisdom of Crowds to answer this question. Is there a richer abstraction? Is Engagement the new black? Whatever the new approach is - in order to provide accurate results, it must work as implicitly as possible.

We have only to envision the possibilities ...

Looking Forward

So let us take a speculative look at search, circa 2015 . To look at it systematically, we can separate the search engine into the following components from a user perspective:   (To see this breakdown in visual format, check out my earlier post on an abstract architecture for search )

- Query specification
- Base Index
- Relevance Algorithm
- Results Visualization
- Ongoing Interest

Let us consider the possible future directions for each one in turn.

Future Directions

    Query Specification (Input)

Google pioneered the keyword-centric, minimalist approach for specifying the search query, and all the major search engines follow that lead. But the search criteria could be so much richer ; instead of experimenting with different types of keyword searches to find the information they need, users could simply provide additional criteria up front to qualify their request.

Admittedly, this approach does not work for everyone. The casual user would get reasonable defaults, which would automatically get updated with regular use to their favorite values; the topical researcher, on the other hand, would actively tinker with these widgets in a "power user" mode. (Google already supports this type of functionality in a limited fashion.)

Some possible advanced features for specifying the query, are given below:

  1. Content Spec: Enabling the user to dynamically specify the data sources to be included, based on domain, reputation, social network, and so on
  2. Scope:  Input for seamlessly limiting the scope of the search, to Enterprise or personal data
  3. Qualifiers: Allow the user to add more information to disambiguate result matches, e.g. qualifying if "Java" means the programming language or the island
  4. Parameter ranges: Domain-specific parameters can be extremely valuable even to a general-purpose engine (see #5 in the section on Relevance Algorithms below)
  5. UI paradigms:  Text keywords are a limited form of input. The actual input mechanisms could be more visual, in the form of sliders, buttons, fields and other UI widgets. Imagine, for example, that as you move a slider, the search results change or an increasing number of results appear on the page!
  6. Multiple Profiles: Personalization does not always have to be implicit. A user could explicitly set up profiles to represent different interests - professional, hobby, personal and so on, so that switching the profile would quickly change the areas of interest

    Base Index (Content)

This is a core area of concern for search engines: what is the scope of content to be considered when searching for information?

The standard approach currently is to build web crawlers that continuously scan as many web sites and web pages as possible; the scanned content is then used to build a master content index that is then updated regularly. This index is then used as the basis when searching for information.

For the base index, the big changes in the future are likely to involve both the scope and understanding of the content; here is a short list:

  1. Rich media search, e.g. true indexing of audio and video content
  2. Dynamic content search (searching the invisible web )
  3. Integration of personal, web and corporate information
  4. Perspective-based search, e.g. conservative vs liberal, hard news vs opinion, and so on
  5. Subset creation, on-the-fly, e.g. to search for domain-specific data

    Relevance Algorithm (Mechanism)

This is, of course, the most-debated topic when discussing the future of search engines. Clearly, many different approaches and technologies show promise; some of these are noted below:

  1. Personalization (but without storing personal info )
  2. Social Input / Wisdom-of-Crowds (which has its pitfalls )
  3. Social Graph: where your selected network of people help improve search results (Robert Scoble has recently gotten religion about this concept; Danny Sullivan rebuts )
  4. Semantic Processing: of both, the query AND the content   (will this let the Search Engine find answers that we never knew we had?)
  5. Parametric Search:  Vertical search engines already routinely offer domain-specific parametric search; for example, job search engines allow the user to specify the all-important location of the job as a primary criterion. Can this type of feature be generalized, so that as a user drills down deeper into search results, an increasing number of parameters can be offered?
  6. Human-powered Search, for either the short head or the long tail of search
  7. Swarm Intelligence: Mimicking biological search, such as Ant colony optimization, particle swarm optimization, and so on

    Results Visualization (Output)

Again, Google leads the way with its minimalist approach: simple headings, links and snippets of text. This is slowly changing, with the new "Universal Search" approach from Microsoft, Google and others; Ask.com is a leader in this area.

Search engines of the future will likely implement completely new paradigms for users to navigate and view search results. Often, meta-results - representing information about the results - are as important as the results themselves: users can figure out where a given result fits into the overall universe of results, and find the related results to an item of interest that has been found.

Some possibilities for results display are given below:

  1. Tag clusters is not a new concept, but has yet to gain traction among the majors. Quintura, with its dynamic tag cloud display, has one of the best examples.
  2. Organize results information by content type, is something every search engine will have to think about in the future. For example, should news stories be presented in an "overview capsule" fashion, or organized as a timeline-based view? Dale Dougherty at O'Reilly Radar has a brilliant article on this topic: Journalism is burning.
  3. Follow-up actions - on viewing search engine results, a very common user action (as Greg Linden points out ) is to modify the current query, either to drill-down further or to try a different approach to find the required information. Google's "did you mean ..." feature is a step in this direction (although it leaves much to be desired ).
  4. Domain-specific visualization can significantly enhance the understanding of results. This is similar to the data organization point above, but focused on the display itself; results from different vertical domains may require very different visualization techniques, such as colors, graphs, images, trend lines, heat maps, topographic charts, and so on. [For a list of the more exotic variations, check out this amazing list from Smashing Magazine.]
  5. Dynamic scoping - enabling users to widen or narrow search results, based on different criteria - such as geography (local or global), site authority, timeliness, point of view, domain, and so on - is a powerful feature, that will continue to grow in importance.

    Ongoing Interest (Notification)

This can be best explained as a Reverse search, where it is the content that finds the user - thus turning the concept of search on its head.

Most of us have ongoing interests in certain areas; they could be professional, social or personal. It makes a great deal of sense for the search engine to keep track of these interests and pro-actively notify the user at some periodic interval of new items that fit those interests. Google Alerts is an early example in this direction. But enhancements to its functionality in the future could significantly boost its utility.

Some day, search engine notifications could support the following features:

  1. Diverse Mediums: Many search engines already support email notifications. What's to stop them from adding support for many additional delivery mechanisms, such as IM, SMS, widgets, the twitter API, and so on?
  2. Levels of Detail:  Allowing users to set the scope and organization of information presented.
  3. Prioritization: This is a key feature! Once users are able to set priorities for different types of searches and for different areas, this can be used to drive the other features. For example, send me the headline about a breaking news event directly relevant to my blog, as an instant message, but email me a digest of the day's results for baseball scores.
  4. Schedules: Some search results make sense only at certain times of the day; e.g. traffic search results are only relevant at commute times on work days.
  5. Dynamic Control: Finally, empowering users to assert dynamic, granular control over their search alerts would make this functionality truly powerful. For example, once I've been notified about a breaking news story, I might want to artificially boost its priority and delivery method to continually get updates quickly and efficiently.

Power and Responsibility

As search engines start including a few or many of the features described above, search will grow increasingly more powerful. It will get easier to find any information we want, quickly and easily. Whether the information is high-level or detailed, global or local, general or specific, past or present, in any domain - no nugget of human knowledge shall escape this relentless spotlight.

Is shining a light on the darkest corners of the web always a good thing? As a webbed superhero once told me (and a few billion others) - "With great power comes great responsibility!". Privacy advocates are rightly concerned with the growing power of global web search engines; ongoing efforts from official and community channels are essential in minimizing abuse. A related issue is that web content can be archived and searched in perpetuity - the societal effects of this phenomenon have not yet begun to be understood.  A recent New York Times column highlighted this issue (paid content; here's a perspective on it from Slate magazine ).

Conclusion

Clearly, search engines will continue to evolve, and a future engine might well have many of the improvements described above, in the next ten years. But how about even further out - say, 2020 or 2030? Will disruptive changes in networking, computing and information technologies radically change the way search engines operate? A change in the nature of human thinking, interaction and social customs would be even more dramatic, and could cause a change in the nature of search itself.

This is, of course, a fertile area of speculation more in the realm of Science Fiction (for now): for example, will we one day need a galactic search engine? Can we create microscopic information-matching agents, either biological or atomic? Results that suddenly become available to the user as knowledge in the brain? An "implicit" search engine that finds information as we need it?  Why not?



  • Search This Blog


    Web This Blog