May 15, 2008

Yahoo! SearchMonkey - Released to Developers

The good folks from Yahoo! unveiled their new open search platform Yahoo! SearchMonkey, at a developer launch party today at their Sunnyvale headquarters. In some ways, the SearchMonkey platform is revolutionary and a major step forward in search, allowing publishers to participate directly in improving the quality of their own information presented on the Yahoo! search results page (this is also implicitly a push for the bottom-up approach to the Semantic Web, which most industry observers have given up on in favor of a top-down approach). The platform also lets publishers and third-party developers build applications aimed at improving the search experience. Finally, and most important, if enough publishers and app developers participate in the program, it promises to improve the quality of search results for end users.

Features

At the simplest level, you can think of SearchMonkey as a community-powered set of rich information boxes (similar to the Google OneBox) that appear on the Yahoo! search results page. Publishers can provide this rich data to the Yahoo! search index in a variety of ways: through structured data feeds (RSS), through RDF or Microformat markup on web pages, or through simple page extraction. The "Information Bar" shows up underneath the main search results. The Yahoo! search team has also provided tools to enable developers to build search-based applications very simply and easily.

Continue reading "Yahoo! SearchMonkey - Released to Developers" »

May 11, 2008

Powerset Launches Wikipedia Search

Semantic search engine Powerset, which we've written about here before, has just launched its initial release. The current release is limited to indexing Wikipedia content, but it provides a great showcase for their technology and user experience.

For example, my search for "Alexander the Great" provided the following results page:

Continue reading "Powerset Launches Wikipedia Search" »

May 07, 2008

Cognition Technologies recognized by KMWorld as one of "100 that matter"

Cognition Technologies, which focuses on Semantic natural language processing technology, was named by KMWorld as one of the top 100 Companies That Matter in Knowledge Management for 2008.

Says Cognition CEO Scott Jarus:

One of the biggest barriers to building a natural language understanding system is to build the semantic map and the dictionary with details of the syntactic behavior of words (i.e. how words behave within context).  Cognition's team has spent more than 20 years building this capability into Cognition’s Semantic NLP for the English language ...  and our technology is commercially available today!

Semantic search and NLP technologies seem to have arrived - they are generating a lot of buzz lately. In addition to mainstays Hakia and Powerset, there is a spate of new entries, including Cognition, BooRah and eeggi. We will be reviewing some of these new alternate search engines on this blog in the near future.

Congratulations, Scott and the Cognition team!



April 29, 2008

Thoughts about Alternative Search Engines Day 2008

I was at the Alternative Search Engines Day event in San Francisco last week. Organized by Charles Knight of the Alt Search Engines blog (and friends), it brought together key people from over 40 alternative search engines. It was an amazing crowd, full of interesting and bright people, and the overall energy was incredible!

At the keynote, Charles gave a pitch for bringing ASEs together that was very well received. He showed us some examples of what a unified User Interface that combined multiple search engines would look like. I contributed a tiny bit (expanding on the idea that complementary ASEs could band together to provide Federated Searches for enhanced traffic and usability, and listing a few ways for the Alts to cooperate even while competing ).

Continue reading "Thoughts about Alternative Search Engines Day 2008" »

April 20, 2008

Cooperation of Alternate Search Engines: A Manifesto

( This post is inspired by my discussions with my friend, Charles Knight of AltSearchEngines )

Background

I'll be at the Alternative Search Engines Day tomorrow, a unique event in San Francisco put together by Charles and the AltSearchEngines team. The event is sponsored by SeeqPod, UpTake, Matchpoint, HealthPricer, GoPubMed and Blogdimension. (Unfortunately, it's not open to the general public.) If you're part of an Alternative Search Engine, I hope to see you there!

As I was getting ready for the event, it got me thinking about ASEs and how they can work together.

The Case for the Alts

I love the ASEs - Alts rock! Without them, there would be little innovation in Search, no new frontiers to be explored.

The Alts are the ones that keep pushing the envelope with new directions in search technology, whether it's algorithms, user interface, social search or something else.  Although Google has some fine technology and is synonymous with search, I firmly believe that we're still at Search 1.0, and have a long way to go. Because of all this competition from the Alts, and the resulting innovation, web search continues to improve.

Continue reading "Cooperation of Alternate Search Engines: A Manifesto" »

March 31, 2008

Could You Survive For A Day - Without Google?

Can you spend a whole day without using Google? - that's the challenge issued by my friend Charles Knight over on the Alt Search Engines blog (see also ReadWriteWeb's coverage). To help you out, he's going to publish the latest version of his popular Top 100 Alternative Search Engines list tomorrow.

I think this is a great idea! We have all become addicted to the power (and limitations) of Google search - just like television before the age of the Internet, we cannot imagine life without it. And yet, as Charles' list shows, there are plenty of alternative search engines out there, innovating Search in a variety of different ways.

Personally, I'm going to use this opportunity to learn the latest features of Quintura, an innovative search engine we've covered before on this blog (here and here ). Quintura has jumped on board this idea by creating a special destination page for discovering the best hoaxes, pranks, jokes and tricks for April fool's day. [Rest assured, this is no joke!]

So how about you - can you do it? Why not give it a shot and try out an alternative search engine? Or two, or five, or all hundred on Charles' list? Can you last a day, an hour, even five minutes? Try it and the results may surprise you!

February 17, 2008

Social Data: Observations from "Search & The Social Graph" Event

Dave McClure moderated an event on Search & The Social Graph at the Yahoo! campus this week, organized by the Search SIG of the Software Development Forum. With the meteoric rise of Facebook and the heightened interest in leveraging the social graph - both Google and Yahoo! have launched new APIs and OpenSocial is gaining momentum - this discussion was timely and attendance was strong.

The panelists represented some of the most interesting players in this space:

  • Kevin Marks from Google
  • Aditya Agarwal from Facebook
  • Kent Brewster of Yahoo!
  • Eve Phillips, CEO of Chirp

It turned out to be an interesting event, with lots of good discussion about the implications of portability, privacy, utility and monetization of social data. No stranger to the social data space, moderator McClure did an outstanding job of keeping things focused and the discussion lively; he was clearly  knowledgeable and well-prepared, launching into a series of leading questions that moved the conversation forward.

Key Observations

By grouping together related comments, I've distilled the discussion at this event into the following topics:

1. Relevance of Search Results

- With the explosion of self-publishing and user-generated content on the web, the type of data getting created on the web is changing, and the classic search algorithms are becoming less effective.
- Users are increasingly interested in what their friends and peers are doing online.
- By using a social graph to filter out results during a specific search, you can boost the relevance of search results.

2. Monetization

- It is no longer uncommon for a person to become a media source, using tools such as twitter, blogs and RSS feeds; but this is hard to monetize. A referral model works better in this case than advertising.
- Brand advertising is still big, even for social search, but it works differently than for targeted search
- Online brand advertising will move into more interactive experiences in the future
- The key question is: Does membership in a social group signal an intention that can be targeted by advertisers? The panelists felt that, on balance, it did Not
- For a more concrete example: Google's directed search is very monetizable; Facebook has a lot of social data, but user behavior is not very monetizable

3. Privacy

- There is a clear difference between a publicly-proclaimed graph, such as the friends on Facebook, and a private list, such as Email contacts; application developers will ignore this distinction at their peril
- Yahoo!'s Brewster said it best: "There should never be a privacy surprise for the user!"
- Applications should make it clear to users if they are making data public or private; e.g. Flickr is three-valued in this regard

4. Interaction Levels

- From a monetization perspective, all "friends" are not created equal; some connections in the social graph are stronger than others
- The smallest inner set of friends is the most valuable; the first 25 people have 80% of the value
- The viral rate of promotion in Facebook is incredible
- If users can annotate connections, they can more fully express their network graph
- You can infer relationships from user behavior, such as sites visited and click-throughs
- The most important part of social data is the connections, followed by the profile; eventually, it gives you the ability to answer the question: "Who should you go to, to answer this question?"

5. OpenSocial

- OpenSocial allows application developers to write one application, and then take it to where the users are on diverse other social networks
- The vision: take some of the good parts of Facebook and bring those to a lot of people
- This allows any application to spread through the social graph

6. Social Email

- Email networks have a lot of connection data, which has social data buried in it
- These connections can either be one-way or two-way; the difference signals intent on the part of the user
- Google's Marks made an interesting point: a person's email address and personal URL are opposites - with the former, you can communicate with that that person; with the latter, the person communicates with you

Facebook

Facebook's Agarwal did a great job of articulating the company's approach to some of these issues. His contributions to the discussion were somewhat Facebook-centric; but given the strong community interest in Facebook lately, this only added to the value of the panel.

In discussing the value of social data for search, Agarwal compared the issues of selecting for relevance among a large number of results for a targeted search, with those of producing Facebook's news feed, which must also present a large amount of data to the user in a format that's easy to consume.

In terms of privacy, Facebook wants to allow users to annotate the social graph, so that they can fully express their network. This will allow users to separate their strong connections from casual friends. The size of a user's graph is another dimension to be considered.

For data portability, Facebook currently doesn't have any plans to implement enabling features focusing on it. Agarwal clarified that although philosophically they support data portability initiatives, they have not determined it to be the best use of resources at this time.

Finally, although Agarwal did not acknowledge this directly, the panelists agreed that the Facebook-type social network data and searches are far less monetizable than directly targeted activities that display clear intent, such as a Google search.

Chirp

This was the first time I saw a demo of Chirp . Eve Phillips, Chirp's CEO, gave a demo of chirpscreen, an interactive screen saver that displays content from your social network, such as pictures from Flickr and status messages from Facebook. On the whole, the audience loved it - a series of photos of her friends kept popping up on the screen - but there were some concerns about being able to control what gets shown. According to Phillips, Chirp is planning to introduce new features soon that will allow users to set preferences of what content is displayed, from which sources, and so on.

Open Questions

McClure asked some incisive questions to the panelists, which deserve to be listed in their own right; I hope these lead to a wider discussion about social data and related topics:

  • Is Social Search - revolutionary, or evolutionary?
  • Which benefits more from social data: targeted search or discovery?
  • How well does social search monetize?
  • How should we use the social data that's automatically present in Email?
  • If Facebook and other networks encourage lightweight friendships, does it obscure the real social graph?


January 29, 2008

Zvents makes Local Search pop!

There is a class of web search engines that can prove even more useful than Google within a certain context. I'm talking, of course, about Vertical Search engines - the writer and tech strategist Sramana Mitra considers them Google's Achilles heel and Profy.com's Cyndy Aleo-Carreira seems to agree. This blog also has long held the position that vertical search represents a powerful mechanism to find information on the web, and is a key category to watch in the search wars of the future. [see: The rise of Vertical Search Engines from Aug 2006].

Another way of achieving a similar focus, in order to improve the relevance of search results, is by segmenting by location rather than by industry vertical - i.e. create a hyperlocal search engine that limits its search results to a given geographical area.

One such alternate search engine is Zvents, which is relentlessly focused on local information, of any sort. This company, which has been around since early 2005, has just introduced an advanced feature called Federated Local search - basically, its own version of Universal Search (recall that Google introduced its Universal Search feature with much fanfare last May).

Federated Local Search: Multi-Dimensional Results for Local Information

What does Universal Search mean, for a local search engine? Initially this was not very clear to me; an email discussion with Paul O'Brien, Director of Marketing at Zvents, inspired me to draw the following diagram:



The basic idea is to enable the user to implement a general-purpose search within a local context. This allows the user to find local information about a given topic, across many different dimensions. For example, a sports fan living in San Jose, CA who tries a local search for the term "hockey", would get the following different types of results:

  • Upcoming games for the San Jose Sharks, the local hockey team
  • The location of Roosevelt Park Roller Hockey Rink
  • The description and link for a local "Hockey Night" event
  • Results about relevant personalities (what Zvents calls "performers")
  • And other related links ...

Zvents has already partially implemented this vision, although some of the lower-ranked results could provide a better match. Hopefully these will improve in the future as the search index grows and the algorithm improves. A screen shot of this local Hockey search in Zvents is given below.



Similarly, here's a search for the term "Web 2.0" for Cupertino, CA:



Outcome: Relevance

The big advantage of this type of search, over a general-purpose Google or Yahoo! search, is that the user can obtain the benefits of a broad cross-section of results, while still constraining the search to a limited geographical area.

This is not a significant issue in highly developed, urban, technologically advanced areas like Silicon Valley, Boston or New York; but it could one day make a big difference for someone living in David Letterman's "home office" of Wahoo, NE , or even more important, someone trying to find the Boston Public School located in Boston, Ontario - as we've seen before, highly popular keywords tend to swamp nearby long-tail keywords in the search results for major search engines.

From a business model perspective, hyperlocal searches tend to provide highly qualified prospects for local merchants, so I would guess that this type of search is very easily monetizable in the long run.

From a user interface point-of-view, the NLP-like implementation of time period for the search engine ("when: tonight, this weekend, ...") is a nice touch; I tried different possibilities ("next month"), and it seemed to work just fine.

On a more technical note, Zvents has been making waves with the release of its open-source Bigtable clone called Hypertable, which adds a C++ option for this project.

Going forward, it will be interesting to see how Zvents scales to additional locations, and to additional dimensions within each locality. Will it make inroads into the market share for any of the major search engines, or into that of other locally-focused web sites like topix.com and craigslist?



January 27, 2008

Quintura Launches Site Search Widget

Alternative search engine Quintura, which I've mentioned before on this blog, has launched its site search widget. This widget allows site publishers to provide users with a specialized search limited to that specific site; it joins earlier offerings from Google, Yahoo!, Rollyo and Eurekster swiki in this space.

This blog was an early user of this widget. You can see a customized, Quintura-generated mini-tag cloud in the earlier post; a full-size tag cloud is also available. The widget is hosted by Quintura, so installation was a snap: once the site was indexed, all I had to do was to embed the widget code into my blog pages and provide some styling control.

The biggest benefit of using the Quintura solution, as I've said before, is the dynamic tag cloud that allows the user to navigate the search space; initial feedback from our readers here has been positive, but not enthusiastic.

The real benefits to both users and publishers will come when Quintura search results prove to be better than equivalent results from a mainstream search engine solution, such as Google; as long as the Google site search results are good enough, it will be hard for the Quintura widget to make significant inroads into the market share of the big-G juggernaut.

This widget release is currently in private beta; an invite for this beta is available over on ReadWriteWeb.



January 22, 2008

Disambiguation of Search Results? Yup, Google's got that

Just last week, in an email exchange with another search blogger, I wondered when Google would provide options for disambiguation of search results.

When you think about it, that's an obvious requirement for the Results page of any serious search engine. If I query for the search term "Java" - does it mean that I'm looking for results about the programming language, the coffee, or the island in Indonesia?

There's no way for the search engine to be able to tell, although personalization could provide clues. The easiest solution, as I wrote back in 2006, is for the search engine to just ask - which is why Wikipedia offers this page: Java (disambiguation) . Alternatively, the results can be grouped into various categories for the user to choose from, which is another way of doing the same thing.

Until now, Google has been mostly following a third option, which is to simply pick the most popular category regardless of the user's real preference; this can lead to some strange results, as highlighted in my earlier post on deconstructing real Google searches. But this approach doesn't really cut it, since it ignores all the unpopular search results - it's very possible that the long-tail searches can collectively make up a market share that rivals or exceeds the relatively few "popular" searches.

There has also been a limited amount of disambiguation offered by Google's "related searches" feature.

Well, no more. Google appears to be experimenting with offering disambiguation directly by grouping search results into categories. See the screen shot below, that shows Google search results for the query: "freebase" . Effectively, the results page seems to be asking: do you mean, the free semantic web database, or the other kind, associated with drugs? Or a third alternative: FreeBase - a free Windows software program to configure the Apple AirPort Base Station.

The use of horizontal ruled lines to separate the sections, is a nice touch!



Obviously this is some type of test; I certainly hope it's successful. I can't wait to see this feature become mainstream among the major search engines. It will be a big step forward in Search!



  • Search This Blog


    Web This Blog