May 11, 2008

Powerset Launches Wikipedia Search

Semantic search engine Powerset, which we've written about here before, has just launched its initial release. The current release is limited to indexing Wikipedia content, but it provides a great showcase for their technology and user experience.

For example, my search for "Alexander the Great" provided the following results page:

Continue reading "Powerset Launches Wikipedia Search" »

January 09, 2008

Deconstructing real Google searches - why Powerset matters

I was looking at the log files for my blog today, as I regularly do, and I was suddenly struck by the variety of search queries in Google for which users were getting referred to my posts. I write often about the different flavors of search - including vertical search, parametric search, semantic search, and so on - so users with queries about Search often land here. But do they always find what they're looking for?

Some Real-life Search Results

Let us examine some of the actual Google queries - in the form of referring URLs - that led users to my blog. In most cases, Google did a fine job of matching the content to the query; in some cases, it was a somewhat random match at best; finally, in a few cases, the Google search algorithms are clearly getting confused. It is this third case that is the most interesting.

The Good

In many cases, the match was quite straightforward and very relevant. Some examples are given below.
1.

Query: http://www.google.fr/search?q=Guru+Avinash+Kaushik& ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official& client=firefox-a
ResultA conversation with Avinash Kaushik, Web Analytics Guru

Well, can't argue with that!

2.

Query: http://www.google.cn/search?sourceid=navclient&aq=t& hl=zh-CN&ie=UTF-8&rlz=1T4XNLA_zh-CNCN246CN247& q=vertical+search
ResultThe rise of Vertical Search Engines (VSEs)

Query:    http://www.google.com/search?
q=wikipedia+to+try+and+compete+with+google& ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official& client=firefox-a
ResultWikipedia Search to compete with Google

Again, can't argue with those.

3.

Query:  http://www.google.com/search?hl=en& q=search+technology+exits
ResultSo You've Built an Alternative Search Engine - Now What?

This is actually pretty awesome, the algorithm has figured out "search technology" and "exits"; in fact, this post does talk about exit strategies for search engines, so it's a great match.

The Bad

Some search queries are so vague that the matches you get are bound to be somewhat random. I don't blame Google for the following matches:

4.

Query:  http://www.google.com/search?hl=en& q=conceptual+architecture
ResultA Conceptual Architecture for Search

Is the search string too vague? Although technically this post matches the search query, I'm guessing that this is not what the user intended to look for.

5.

Query: http://www.google.com/search?hl=en&safe=off& client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial& hs=pGP&q=disruptive+technologies+blog& btnG=Search
ResultDisruptive technologies for 2007

While the words match, and possibly this may satisfy the user, I get the sense that the user was looking for a blog dedicated to discussing disruptive technologies, not a single post. But who knows? Again, too vague!

In the future, I wonder how soon Search technology will progress to the point where the UI will automatically ask the user for more information to qualify search terms that are too general or vague. A little while ago, I envisioned a similar scenario ( Vertical Search, with authority ) when taking a look at the search engine MetaMojo, which has taken some steps in this direction.

The Ugly

In a few cases, though, the proximity of certain keywords fools the search algorithms. Consider the following matches:

6.

Query: http://www.google.com/search? q=best+search+engine+for+directions&ie=utf-8& oe=utf-8&aq=t&rls=org.mozilla:en-US:official& client=firefox-a
ResultFuture Directions in Search

A post about "future directions in Search" is not a post about "search engines for directions", although the text itself is undoubtedly a close match.

7.

Query:  http://www.google.com/search?hl=en& q=people+search+software+compared&btnG=Search
ResultSearch and the Dumbness of Crowds

Hmm? This is a popular post, but I'm not sure if it helps the user, who is not trying to compare search strategies (as this post does); instead, the user appears to be trying to compare people search engines.

Are these good matches? While the content of the posts bears a superficial resemblance to the text in the respective queries, the results are not relevant to the requested user searches.

The Larger Problem

The samples given above are not that important; the matches from my blog do not always show up at the top of the search results and although these are real referrals, not many users will actually click on these links in the Results page. But these examples point to a deeper underlying issue, one that will be far from easy to fix in the general sense.

All the major search engines currently rely on the proximity of keywords and search terms to match results. But that approach can be misleading, causing the search engine to systematically produce incorrect results under certain conditions.

To demonstrate, let us take a look at three general use cases.

[Note: The examples given below are all drawn from Google. To be fair, all the major search engines use similar algorithms, and all suffer from similar problems. For its part, Google handles billions of queries every day, usually very competently. As the reigning market leader, though, Google is the obvious target - it goes with the territory!]

1. Difficulty of Finding Long Tail Results

Take Britney Spears. Given the current popularity of articles, pictures and videos of the superstar singer, the results for practically any query with the word "spears" in it will be loaded with matches about her - especially if the search involves television or entertainment in any way.

Let's say you're watching the movie Zulu and you start wondering what those large spears that all the extras are waving about, are made of. So, you go to Google and type in "movie spears material" - this is an obviously insufficient description, as the screen shot below shows.




What happens if you expand on the query further - say: "what are movie spears made out of?" - does it help? Here's a screen shot.




The general issue here is that articles about very popular subjects accumulate high levels of PageRank and then totally overwhelm long tail results. This makes it very difficult for a user to find information about unusual topics that happen to lie near these subjects.

2. Keyword Ordering

Since the major search engines focus only on the proximity of keywords without context, a user search that's similar to a popular concept gets swamped with those results, even if the order of keywords in the query has been reversed. For example, a tragic occurrence that's common in modern life is that of a bicycle getting hit by a car. Much less common is the possibility of a car getting hit by a bicycle, although it does happen. How would you search for the latter? Try typing "car hit by bicycle" into Google; here's a screen shot of what you get.  [Note the third result, which is actually relevant to this search!]



3. Keyword Relationships

Since the major search engines focus only on the keywords in the search phrase, all sense of the relationship between the search terms is lost. For example, users commonly change the meaning of search terms by using negations and prepositions; it is also fairly common to look for the less common members of a set.

This takes us into the realm of natural language processing (NLP). Without NLP, the nuances of these query modifications are totally invisible to the search algorithms.

For example, a query such as "Famous Science fiction writers other than Isaac Asimov" is doomed to failure. A screen shot of this search in Google is given below. Most of the returned results are about Isaac Asimov, even when the user is explicitly trying to exclude him from the list of authors found.



All of the searches shown above look like gimmicks - queries designed intentionally to mislead Google's search algorithms. And in a sense, they are; these specific queries can be easily fixed by tweaking the search engine. Nevertheless, these queries do point to a real need: the value of understanding the meaning behind both the query and the content indexed.

Semantic Search

That's where the concept of semantic search comes in. I attended a media event earlier this year at stealth search startup Powerset (see: Powerset is Not a Google-killer! ) which showcased a live demo of their search engine, currently in closed alpha, that highlighted solutions to exactly this type of issue.

For example, type "What was said about Jesus" into a major search engine, and you usually get a whole list of results that consist of the teachings of Jesus; this means that the search engine entirely missed the concepts of passive voice and "about". The Powerset results, on the other hand, were consistently on target (for the demo, anyway!).

In other words, when you look at just the keywords in the query, you don't really understand what the user is looking for; by looking at them within context, by taking into account the qualifiers, the prepositions, the negatives, and other such nuances, you can create a semantic graph of the query. The same case can be made for semantic parsing of the content indexed. Put the two together, as Powerset does, and you can get a much better feel for relevance of results.

What about Google? I'm sure the smart folks in Google's search-quality team are busily working on this problem as well. I look forward to the time when the major search engines handle long tail queries more accurately and make Search a better experience for all of us.



December 19, 2007

Search Improvements 2008 - THAT'S IT?

A few days ago, Gord Hotchkiss, President and CEO of Enquiro , moderated a Webinar with the Search 2010 Panel; the panel is a who's who list of stellar participants in the Search space, including representatives from all the major search engines. You can find the actual Webinar and read Gord's post about it here: Search 2010 - A Review.

Gord writes:

I won’t steal the panelists thunder, but the first question I posed to them was what they see as the biggest change to search in the coming year. Most pointed to the continued emergence of blended search results on the page, as well as more advances in disambiguating intent. A few panelists looked at the promise of mobile, driven by advances in mobile technology such as multi touch displays, embodied in the iPhone.

He adds:

[One area]  ... is how search functionality will start showing up in more and more places. Already, we’re seeing search being a key component in many mash ups. The ability to put this functionality under the hood and have it power more and more functional interfaces, combined with other 2.0 and 3.0 capabilities, will drive the web forward.


Charles Knight of AltSearchEngines, in his reaction to the Webinar [ Thomas Jefferson Dines Alone ], writes:

So what did they see as the biggest change coming to Search in 2008?
...

Let’s break it down: 1) the continued emergence of blended search results 2) more advances in disambiguating intent, and 3) the promise of mobile…such as…the iPhone.

That’s it?  That’s what the key major search engine insiders and industry analysts predict for the roller coaster year ahead?  More of the same - and the iPhone?


Now, (disclosure) I'm an occasional contributor to ASE and Charles is a personal friend of mine, so I grant that I'm biased; but I'm with Charles on this one. That's it? Those are the key changes to search predicted by the major search engines for the next year? Is it just me, or do all of these changes seem - evolutionary, not revolutionary?

In a recent article on Future Directions in Search, I highlighted the major areas for potential advances in search: Query specification, Base Index, Relevance Algorithm, Results Visualization and Ongoing Interest (Notification). In that article, I was looking at a much longer time horizon, but I expect that some discontinuous changes will occur in one or more of those areas within the next year.

Search is a highly dynamic field that presently generates a tremendous amount of interest among scientists, engineers and entrepreneurs. (Google's stratospheric market cap has ensured that!). There are so many search startups coming up, many of which are introducing new concepts and technological innovation, such as - Vertical Search: indeed, Spock, and many, many others; semantic search: hakia, powerset; dynamic results visualization: quintura; ways to add value: trulia, zillow; ways to speed up search: vortexDNA; and so on. At least through acquisition, if nothing else, the mainstream search engines should be able to move ahead quickly.

As a specific example, let's look at the Video search space. I recently discovered Mark Robertson's web site, ReelSEO, which is dedicated to SEO/SEM of video content. On his site, Mark hosts the Comprehensive list of video search engines and video sharing sites, which lists over 100+ sites dedicated to video sharing and search. With so many players, surely there's someone who will introduce a new concept or significant change in video search?

Finally, let us acknowledge the elephant in the room. What about - discontinuous improvements to the heart of the Search Engine, the PageRank algorithm? After all, reduced to essentials, PageRank is only an approximation of the authority of a web page or site, based on the value and authority of incoming links. It was certainly an amazing insight on the part of Google's founders, and worthy of the success it attained; but just because all the major search engines use it today does not make it the right way or the only way to identify relevant results.

Perhaps there are other approximations which may work as well or better? Examples of alternative algorithms include: swarm intelligence (like Ant Colony Optimization), human algorithms (e.g. people-powered engines for popular searches and breaking news), brand authority (hey, we use it for everything else in life!), social graph, and many others.

Regardless of what Gord's A-list panel says, there's one thing I'm sure of: 2008 will be an exciting year for Search!



September 25, 2007

Can the Semantic Web bring us Trusted Search Results?

Nova Spivack, during his recent talk about the Semantic Web (covered in my previous post ), made the point that addition of semantic processing to the underlying index for a search engine, make the issue of trust more serious. This was an intriguing statement, and I followed up with him via email to get further clarification; he was kind enough to respond at length. The questions and answers from our exchange are given below.


1. For Semantic web to really take off, the information on the web at large needs to become Semantic Web-compatible; i.e. web pages need to provide semantic information in the form of RDF, OWL etc. Do you see this happening in the forseeable future, given the huge mass of pages that already exist?
    Or is it more likely that technology will have to solve this problem for us, and we'll need to invent algorithms that can interpret currently existing web pages to extract and apply semantic knowledge on top (such as ClearForest Gnosis )?

The DBpedia.org is a good start. Also check out the emerging SPARQL and GRDDL standards at W3C -- they will bring existing data into the RDF world. There is also growing body of RDF already out there in the Dublin Core, FOAF and other ontologies on content on the Web. More will be coming from many big companies, Adobe, Oracle, Yahoo, etc. And of course other startups like Metaweb and Radar Networks will be adding a lot of content to the mix in different ways as well.


2. In your talk, you mentioned Trust - e.g. you referred to Powerset, wondering how they could add some sense of trust to the results they find through semantic processing (NLP) of web pages, because otherwise it would not be useful.
    I'm not sure I understand.

They are mining full text of the web and automatically building a knowledge base from that. So when they see a web page that says "Microsoft is a terrorist organization" or "Microsoft is a software manufacturer" how do they know which statement is true or false? Who do they trust? How do they determine who to trust? This determines what facts or assertions get what level of weight in their knowledge base. It's the crux of the issue really. You can mine in a lot of assertions, but if you have no good way to filter out the garbage, spam, erroneous statements, or deliberate deception, you can't use it for anything real. One solution is to only mine highly trusted sources -- such as encyclopedias and major newspapers for example. That's not a bad way to start. That would generate a decent knowledge base.

But the DBpedia.org might be a better way to start than mining free text. They've already done the heavy lifting of turning the wikipedia into RDF. I'm not sure you need natural language to get good, reasonably trustworthy knowledge, just use the DBpedia.

In the case of Powerset I believe their goals are different than the DBpedia/Wikipedia -- I think they don't just want almanac content, they want specialized vertical knowledge about travel, products, etc. That will require that they either are very selective of their data sources, or they have a sensitive way to measure trust and rank information accordingly.


        a] How does the addition of semantic processing to the underlying index make the issue of trust more serious? Google's PageRank is basically an approximation of the Wisdom of Crowds, using static links to represent votes; if Powerset uses some similar mechanism to rank the information sources in the underlying index, then their results should be no worse than Google's in terms of trust, and better than Google's in terms of relevance. [Incidentally, I've written about Powerset before when I attended their preview event.]

They could perhaps use a pagerank algorithm to attribute more trust to assertions they mine from various sites. That would be one solution. Unfortunately it gets more complex though -- because if their knowledge base has many grades of truth (statements ranked to varying degrees of trust) for a given assertion, then they will have to use modal logic or some other form of fuzzy reasoning to actually do any real reasoning or inferencing. That stuff is hard and uncharted territory to say the least. I don't think they will go there and if they did I think they would not be successful at it. So the question is what can they do without going there?


        b] How will the Semantic Web improve the trust situation for web search results?

First there is the issue of being able to assign a trust rank to each triple. Trust is relative so in fact there may not be a single global measure of trust that applies equally to everyone. I may trust someone that you don't trust and so I may take what they say to have more weight than you would for example. There isn't room within every triple to store that, but triples could be ratified by other triples that express "endorsement" of their content. So if I agree with something I can simply express that and now it is recorded that I (in an authenticated manner) have said I trust it. If lots of people do that with various assertions (triples), records (objects), and sites and people (sources), then we have a network of trust built in RDF. A network of trust can be reasoned on to determine weighted, socially relative trust rankings for triples. It can determine what triples I am likely to trust versus what you are likely to trust versus what everyone is likely to trust.

Second there is the issue of being able to trust reasoning performed by the system. For that the system needs to be able to explain to a human how it reached some logical conclusion and what data it used to do so. Work is being done both on computer-generated explanations and ways to record and show provenance to address these issues.


Many thanks to Nova for his detailed answers. You can find his blog here:  Minding the Planet .



August 26, 2007

Survey Results: The Future of Web Search

Thank you to everyone who participated in the last Software Abstractions survey! We asked: which features do you see as the most important ones for Web Search in the future? The results were interesting.

Out of a total of 33 votes, the top votes were closely split between a variety of answers.

  • Personalization  [6 votes]
  • Social Input  [5 votes]
  • Semantic Query  [5 votes]
  • Semantic Index  [6 votes]
  • Trusted Sources  [6 votes]

For search engines with advanced linguistic parsing capabilities, it's reasonable to assume that semantic processing will be applied to both the query and to the indexed content as a whole. If you combine those two answers, then Semantic Processing is the clear winner with 33% of the votes!

The high number of votes for the "Trusted Sources" answer was a surprise - it's clear that a stronger focus on quality of the results in the future (and their being spam-free) weighs heavily on users.

The complete picture of results is given below:

 

 


August 16, 2007

So You've Built an Alternative Search Engine - Now What?

What is the exit strategy for low-traffic Internet Search Engines? This is a question I've been secretly wondering about for the past few months as I study the growing number of companies in this popular category.

No Lack of Contenders

This already-crowded space is getting saturated. My friend Charles Knight of Alt Search Engines keeps a running list of the Top 100 engines; the overall number easily exceeds 1000.


Photo Source: Funny Hub

There is no doubt that as the amount of online content explodes, driven by easy low-cost publishing and the popularity of social networks, Search is becoming increasingly important as a strategic solution - both within an Enterprise for tying together all the Web 2.0 tools, and on the Internet, for making relevant content accessible.

On the Internet at large, Search is currently dominated by the 5 top-tier Search Engines: Google, Yahoo!, Live Search, AOL and Ask. There is also a second tier of engines that have captured enough buzz that they are likely to be sustainable for the medium-term: Hakia, Quintura, the yet-to-be-launched Powerset, and others. There are also specific Market segments where niche players are likely to thrive - Shopping, Jobs, Travel, Audio, Video and so on. Apart from these top tier and vertical segment players - what about all the rest?

Exit Strategy

I fully expected that many of these smaller, innovative search engines would get absorbed by the larger ones for their technology [for example, Microsoft acquired  Medstory, and there are ongoing rumors of a simplyHired acquisition by Google]; but recently, a pattern has emerged that suggests a different possible outcome.

Search has always been a critical feature for large content providers; conventional wisdom until now for these sites was to implement this feature in one of two ways: either (i) using a site search widget from one of the mainstream search engines (as this blog does), or (ii) by creating a custom search engine based on Google, Yahoo!, Rollyo, Eurekster or others.

Increasingly however, large content providers want to harness captive search engines to improve the user experience. Here are some of the indicators of this trend:

Conclusion

It's not difficult to envision a future where every major provider of content implements a powerful search capability optimized for their particular set of content. It will be interesting to watch how the major search engines leverage these capabilities to enhance findability and the user experience. For example, should they continue to directly index the actual site content, or is it more effective to delegate search tasks for each of these sites to their particular search engines, to enhance relevance of search results? More important - does this trend somehow lead us back towards walled gardens?

----

Alternative vision

Charles Knight of the Alt Search Engines blog (mentioned above) has been working tirelessly to promote an alternative vision: to band together a bunch of alts to create a Universal Interface. This seems the best strategy for the group as a whole - although they bring to the table innovative approaches, interesting new user interfaces, visionary algorithms and specific data sources, the one thing that most Alternative Search Engines lack is a significant amount of traffic. By working together towards a common interface, they could improve that situation. Or a larger company could acquire several of these engines and put them together, achieving a similar effect.

If this vision gets traction, then Google had better watch out!



July 07, 2007

Powerset - a new model for venture-funded startups?

In his latest blog post, Steve Newcomb, COO of Powerset, talks about the openness displayed by his company at their recent Powerlabbers meet. At the risk of being boring - radically disagreeing with him would have been so much more interesting - I think that's a fair statement. All of the Powerset people who spoke to the assembled group, as well as those with whom I chatted afterwards, were genuine and open when talking about the technology and the company. There were several direct challenges, thinly disguised as questions, directed at the senior Powerset folks; I did not see Newcomb dodge a single question.

This type of open and direct conversation with bloggers and the tech media is refreshing. So far, Powerset has shown a mastery at managing the blogosphere - both to create buzz (see my earlier post for a comparison of search engine traffic vs buzz ) and to disarm the over-heated hype. Coupled with their Powerlabs community for product development (similar to those pioneered by Dell, Omniture et al), does this represent a new approach to building venture-funded startups in Silicon Valley: the "open kimono" approach compared to the "ultra stealth" approach?

July 01, 2007

Looking for Googhoo!

My friend Ashkan Karbasfrooshan recently wrote a fascinating article wondering whether Google should buy Yahoo!. In an email, he wondered about my position on this topic, so I thought I would take a whack at it, speaking as an ordinary investor. [On the other hand, if you're serious about investing, you should ditch my article and read Jeff Matthews' series of long articles about his pilgrimage to Omaha to observe the world's greatest investor - yes, he (Jeff) isn't making it up!]

The Upside

For what it's worth, I think there are good reasons why Google should consider this idea seriously:

1. Stock price: In my opinion, Google is overvalued and Yahoo! is undervalued at this point - making it the perfect time to buy market share (and traffic) at a low price.

 

 

2. Traffic: Yahoo! still outdoes Google in pure traffic. And the quality of the traffic is intrinsically different. The predominant portion of Google's traffic comes from its flagship search engine site. Is this traffic defensible? Since the cost of switching is trivial for individual users, there's no lock-in and little loyalty. Remember Alta Vista? How long before someone comes up with a better mousetrap [i.e. search algorithm]?

At the same time, Yahoo! has really broad, deep content, which makes it far more defensible IMHO. Content really is king!

 

 

3. Mind share: Google and Yahoo! are still neck-and-neck in terms of user mind share. Microsoft is a somewhat distant third, however the folks from Redmond are far from out of the race. A combined G/Y! company that offers powerful algorithms (Google search), rich applications (Google Apps) and wide-ranging quality content (Yahoo! web properties) would present Microsoft with a formidable competitor indeed!

 

 

The Downside

1. Culture: Looking from the outside, I imagine that the cultures of these two companies are quite different!
2. Anti-trust concerns: Will the DOJ cause hang-ups on issues of search monopoly?
3. Financials: Will the numbers work? I have no idea - I'm no corporate deal-maker. (Ashkan's post throws some light on this issue.)

A rant about Google's search

Is Google peaking in Search? A couple of search events this week made me consider this possibility.

First, Google's Marissa Mayer gave a wide-ranging talk about the Future of Search, highlighting eight initiatives that Google is focusing on to improve search. All of these search improvements seem to be incremental - honestly, which of these eight are truly revolutionary?

Second, I had the opportunity to attend the blogger/tech-media event at much-hyped search startup Powerset. By implementing natural-language processing techniques, Powerset has the potential to improve the relevance of search results radically by matching the semantics of the content in its index with that of the query. This is a great example of an approach that, if effective, will certainly offer more than incremental improvement!

It remains to be seen if a Google-killer will emerge soon in the world of search; if history is any guide, we should see long periods of incremental development, punctuated by sudden, unexpected, discontinuous changes.

Conclusion:

I for one, would welcome our new search overlords from Googhoo!



[Disclosure: I have no positions in either Yahoo! or Google.]


June 29, 2007

Powerset is Not a Google-killer!

This was one of the many memorable statements from Steve Newcomb, co-founder and COO of Powerset, at a blogger/tech-media bash at the company's San Francisco headquarters tonight, where they publicly unveiled Powerset's capabilities, technology and people for the first time. Powerset, a heavily-anticipated Semantic search engine (see the earlier post for the divergence between traffic and buzz ), has been gathering a lot of press lately as a potential "Google-killer".

Of course, the quote above is taken slightly out-of-context (but it makes a great headline!). According to Newcomb, Powerset owes a debt of gratitude to Google, as do all new Search Engine companies: after all, Google brought tremendous value to Search. It was Google that put Search on the map, it was Google that convinced ordinary users that simply by typing in a query string, they should be able to see and access any information on the web within a page or two. So, explained Newcomb, no one at Powerset hates Google - although, of course, they will happily try to take market share away from Google by offering a better search.

Key Strengths

That they have a better search, at least for a large subset of queries, is something I'm convinced of,  after tonight's demo. I think Powerset has three key strengths going for it:

1. Innovative Technology
My preconceived notions of Semantic Language Search revolved around providing additional context by applying lexical and semantic algorithms to the query string. Powerset's search technology, it turns out, does far more - it creates a fundamentally richer representation of the content in its index, rather than the proximity-based strings of tokens used by conventional search engines. By creating an enhanced semantic understanding of both the query and the content, Powerset's search engine can provide much better matches and therefore relevance for search results.

This natural-language parsing technology was originally created at Xerox-PARC. Its algorithms can extract the meaning from a given page, although in the absence of any contextual information in the query, the search falls back to keyword-matching algorithms (similar to Google). The combination of natural-language processing of the query and semantic knowledge of the content, means that Powerset can return fundamentally different results for the following two queries - (a) What companies did Peoplesoft acquire, and (b) Who acquired Peoplesoft? - results that accurately reflect the different intents of these two questions. Conventional keyword-based search engines like Google and Yahoo! would be hard-pressed to provide different, accurate results for these two queries. Similarly, flipping a query, or replacing specific words with synonyms, does not affect the results found in Powerset.

The Powerset demo included many of these rich queries, where the relevance of the results is based on getting a semantic understanding of the queries. Some examples are given below:

  • What politicians were killed by disease?
  • Who mocked Blair?  [This produced results that included the words lampoon, caricature, satirical and impersonators - very cool!]
  • What did Jesus say? [Again the results were quite meaningful; e.g. nothing about what was said to Jesus]

Of course, the catch here is that all this semantic mapping gets quite expensive when building the index. According to Newcomb, Powerset currently depends on Amazon EC2 as well as having their own Data Center.

2. Social Skills
No, I'm not referring to Emily Post. It was clear that Powerset fully understands the value of leveraging social networks and tapping into the Web 2.0 ecosystem. In addition to creating a destination site for Internet Search, Powerset also plans to focus on the underlying platform, eventually making it available for mashups, widgets and APIs.

As an example, Scott Prevost, Director of Products, showed us a demo of a mashup called The Entertainator. According to Prevost, this was an easy integration that was completed in an afternoon, to highlight the use of Structured Data to improve the results of natural language queries. I found this simple demo, based on data pulled from Freebase, to be very impressive! It correctly answered questions such as the following, in the latter two cases coming up with a single, accurate answer:

  • What movies did Al Pacino star in?
  • Who directed The Matrix?
  • and finally

  • Who said "You can be my wingman"

This type of single-response search is very valuable for Mobile Search, Newcomb pointed out, where precision is critical for search to be effective.

3. An innovative Product Launch model/community
At today's event, Powerset unveiled a community product called PowerLabs - billed as a combination of Facebook, Digg and Google Labs. The idea is to encourage participation from a community of technologists, bloggers, tech journalists and search enthusiasts, utilizing the combined wisdom of the crowd to provide feedback, generate new ideas, suggest features and implement mashups. A higher level of participation enables community members to earn badges, higher levels of rank (a la World of Warcraft) and credit for features that actually get implemented. [There was no mention of cash prizes of any sort.] Participants can vote ideas up or down in Digg-like fashion. At a high level, this approach is similar to Dell's IdeaStorm. Used in this way, though, it's an innovative approach for controlling (and deflating) some of the hype surrounding the much-anticipated availability of Powerset's search engine, and allows Powerset to gauge market reaction and gain user acceptance prior to launch. Mark Johnson, Product Manager of Powerlabs, showed us a demo of some of the cool community features of this application.

Missing Pieces

Interestingly, it does not appear that Powerset is paying attention to two key areas of search improvement for the future: Personalization and Social Input (aka Wisdom of Crowds). For the latter, the Powerlabs community can provide feedback on the relevance of search results for particular queries, but that is a far cry from implicitly gathering user behavior information on a continuous basis to improve search results - examples of search engines that provide this type of functionality are: BayNote, Collarity and Loomia. Personalization, especially, would be useful input for Powerset searches, by implicitly providing user-specific context to minimal queries where there is little or no semantic information available.

Happy Employees

Most importantly, Powerset has a highly successful buzz going, not only in the blogosphere, but also within its own ranks. I spoke to several employees, both developers and managers, and each one was actively excited to be there and to be contributing to a solution they see as game-changing in Search - there was a palpable sense of energy and excitement in the place. This kind of excitement can provide a "positive override" that enables a startup to navigate the inevitable shoals and reefs which would sink a lesser company (although, if not carefully managed, it can also cause dashed hopes if expectations are too high).

Conclusion

One thing is for certain: a serious challenge from a well-funded startup offering meaningful semantic search, is bound to force Google to improve - this can only be good for search users in general.

Overall, Mr. Newcomb, I must respectfully disagree with you - Powerset is a potential Google-killer, or at the very least, I predict that Powerset will give Google a run for its money!

Further Reading

For additional perspectives on tonight's event, check out the following:



  • Search This Blog


    Web This Blog