May 15, 2008

Yahoo! SearchMonkey - Released to Developers

The good folks from Yahoo! unveiled their new open search platform Yahoo! SearchMonkey, at a developer launch party today at their Sunnyvale headquarters. In some ways, the SearchMonkey platform is revolutionary and a major step forward in search, allowing publishers to participate directly in improving the quality of their own information presented on the Yahoo! search results page (this is also implicitly a push for the bottom-up approach to the Semantic Web, which most industry observers have given up on in favor of a top-down approach). The platform also lets publishers and third-party developers build applications aimed at improving the search experience. Finally, and most important, if enough publishers and app developers participate in the program, it promises to improve the quality of search results for end users.

Features

At the simplest level, you can think of SearchMonkey as a community-powered set of rich information boxes (similar to the Google OneBox) that appear on the Yahoo! search results page. Publishers can provide this rich data to the Yahoo! search index in a variety of ways: through structured data feeds (RSS), through RDF or Microformat markup on web pages, or through simple page extraction. The "Information Bar" shows up underneath the main search results. The Yahoo! search team has also provided tools to enable developers to build search-based applications very simply and easily.

Continue reading "Yahoo! SearchMonkey - Released to Developers" »

February 17, 2008

Social Data: Observations from "Search & The Social Graph" Event

Dave McClure moderated an event on Search & The Social Graph at the Yahoo! campus this week, organized by the Search SIG of the Software Development Forum. With the meteoric rise of Facebook and the heightened interest in leveraging the social graph - both Google and Yahoo! have launched new APIs and OpenSocial is gaining momentum - this discussion was timely and attendance was strong.

The panelists represented some of the most interesting players in this space:

  • Kevin Marks from Google
  • Aditya Agarwal from Facebook
  • Kent Brewster of Yahoo!
  • Eve Phillips, CEO of Chirp

It turned out to be an interesting event, with lots of good discussion about the implications of portability, privacy, utility and monetization of social data. No stranger to the social data space, moderator McClure did an outstanding job of keeping things focused and the discussion lively; he was clearly  knowledgeable and well-prepared, launching into a series of leading questions that moved the conversation forward.

Key Observations

By grouping together related comments, I've distilled the discussion at this event into the following topics:

1. Relevance of Search Results

- With the explosion of self-publishing and user-generated content on the web, the type of data getting created on the web is changing, and the classic search algorithms are becoming less effective.
- Users are increasingly interested in what their friends and peers are doing online.
- By using a social graph to filter out results during a specific search, you can boost the relevance of search results.

2. Monetization

- It is no longer uncommon for a person to become a media source, using tools such as twitter, blogs and RSS feeds; but this is hard to monetize. A referral model works better in this case than advertising.
- Brand advertising is still big, even for social search, but it works differently than for targeted search
- Online brand advertising will move into more interactive experiences in the future
- The key question is: Does membership in a social group signal an intention that can be targeted by advertisers? The panelists felt that, on balance, it did Not
- For a more concrete example: Google's directed search is very monetizable; Facebook has a lot of social data, but user behavior is not very monetizable

3. Privacy

- There is a clear difference between a publicly-proclaimed graph, such as the friends on Facebook, and a private list, such as Email contacts; application developers will ignore this distinction at their peril
- Yahoo!'s Brewster said it best: "There should never be a privacy surprise for the user!"
- Applications should make it clear to users if they are making data public or private; e.g. Flickr is three-valued in this regard

4. Interaction Levels

- From a monetization perspective, all "friends" are not created equal; some connections in the social graph are stronger than others
- The smallest inner set of friends is the most valuable; the first 25 people have 80% of the value
- The viral rate of promotion in Facebook is incredible
- If users can annotate connections, they can more fully express their network graph
- You can infer relationships from user behavior, such as sites visited and click-throughs
- The most important part of social data is the connections, followed by the profile; eventually, it gives you the ability to answer the question: "Who should you go to, to answer this question?"

5. OpenSocial

- OpenSocial allows application developers to write one application, and then take it to where the users are on diverse other social networks
- The vision: take some of the good parts of Facebook and bring those to a lot of people
- This allows any application to spread through the social graph

6. Social Email

- Email networks have a lot of connection data, which has social data buried in it
- These connections can either be one-way or two-way; the difference signals intent on the part of the user
- Google's Marks made an interesting point: a person's email address and personal URL are opposites - with the former, you can communicate with that that person; with the latter, the person communicates with you

Facebook

Facebook's Agarwal did a great job of articulating the company's approach to some of these issues. His contributions to the discussion were somewhat Facebook-centric; but given the strong community interest in Facebook lately, this only added to the value of the panel.

In discussing the value of social data for search, Agarwal compared the issues of selecting for relevance among a large number of results for a targeted search, with those of producing Facebook's news feed, which must also present a large amount of data to the user in a format that's easy to consume.

In terms of privacy, Facebook wants to allow users to annotate the social graph, so that they can fully express their network. This will allow users to separate their strong connections from casual friends. The size of a user's graph is another dimension to be considered.

For data portability, Facebook currently doesn't have any plans to implement enabling features focusing on it. Agarwal clarified that although philosophically they support data portability initiatives, they have not determined it to be the best use of resources at this time.

Finally, although Agarwal did not acknowledge this directly, the panelists agreed that the Facebook-type social network data and searches are far less monetizable than directly targeted activities that display clear intent, such as a Google search.

Chirp

This was the first time I saw a demo of Chirp . Eve Phillips, Chirp's CEO, gave a demo of chirpscreen, an interactive screen saver that displays content from your social network, such as pictures from Flickr and status messages from Facebook. On the whole, the audience loved it - a series of photos of her friends kept popping up on the screen - but there were some concerns about being able to control what gets shown. According to Phillips, Chirp is planning to introduce new features soon that will allow users to set preferences of what content is displayed, from which sources, and so on.

Open Questions

McClure asked some incisive questions to the panelists, which deserve to be listed in their own right; I hope these lead to a wider discussion about social data and related topics:

  • Is Social Search - revolutionary, or evolutionary?
  • Which benefits more from social data: targeted search or discovery?
  • How well does social search monetize?
  • How should we use the social data that's automatically present in Email?
  • If Facebook and other networks encourage lightweight friendships, does it obscure the real social graph?


January 27, 2008

Quintura Launches Site Search Widget

Alternative search engine Quintura, which I've mentioned before on this blog, has launched its site search widget. This widget allows site publishers to provide users with a specialized search limited to that specific site; it joins earlier offerings from Google, Yahoo!, Rollyo and Eurekster swiki in this space.

This blog was an early user of this widget. You can see a customized, Quintura-generated mini-tag cloud in the earlier post; a full-size tag cloud is also available. The widget is hosted by Quintura, so installation was a snap: once the site was indexed, all I had to do was to embed the widget code into my blog pages and provide some styling control.

The biggest benefit of using the Quintura solution, as I've said before, is the dynamic tag cloud that allows the user to navigate the search space; initial feedback from our readers here has been positive, but not enthusiastic.

The real benefits to both users and publishers will come when Quintura search results prove to be better than equivalent results from a mainstream search engine solution, such as Google; as long as the Google site search results are good enough, it will be hard for the Quintura widget to make significant inroads into the market share of the big-G juggernaut.

This widget release is currently in private beta; an invite for this beta is available over on ReadWriteWeb.



December 23, 2007

Web Poll Results: What is the Most Important Component of a Search Engine?

Our last web poll had asked readers to vote on what they considered to be the most important component of a Search Engine - an indication of the areas a small search startup should focus on to help capture market share away from the major search engines.

27 readers voted (thank you!). The poll results are shown below.



These results are interesting because I had expected a higher percentage of votes for the Results Visualization choice, followed by the Algorithm choice, but the votes did not match my expectations. Part of the reason could be that readers of this blog are predisposed to have a stronger interest in the algorithms and strategies used by various search engines than in their UI paradigms.

As for the size of the Content Index, that's a metric that is slowly declining in importance. There was a time when the major search engines would fall all over themselves in trying to top each other in terms of the amount of data indexed; but as the content on the web explodes and grows progressively richer, it simply does not matter as much, and that is reflected by the votes.

As expected, the Query Spec choice got completely ignored. The search engine input spec could be so much richer than a minimal number of words or a single phrase. However, I fear that we're condemned to using keyword-ese for specifying our needs to the Search Engine for the long-term future, which is a pity; like the QWERTY keyboard, it may stick with us well after its usefulness has waned.



December 19, 2007

Search Improvements 2008 - THAT'S IT?

A few days ago, Gord Hotchkiss, President and CEO of Enquiro , moderated a Webinar with the Search 2010 Panel; the panel is a who's who list of stellar participants in the Search space, including representatives from all the major search engines. You can find the actual Webinar and read Gord's post about it here: Search 2010 - A Review.

Gord writes:

I won’t steal the panelists thunder, but the first question I posed to them was what they see as the biggest change to search in the coming year. Most pointed to the continued emergence of blended search results on the page, as well as more advances in disambiguating intent. A few panelists looked at the promise of mobile, driven by advances in mobile technology such as multi touch displays, embodied in the iPhone.

He adds:

[One area]  ... is how search functionality will start showing up in more and more places. Already, we’re seeing search being a key component in many mash ups. The ability to put this functionality under the hood and have it power more and more functional interfaces, combined with other 2.0 and 3.0 capabilities, will drive the web forward.


Charles Knight of AltSearchEngines, in his reaction to the Webinar [ Thomas Jefferson Dines Alone ], writes:

So what did they see as the biggest change coming to Search in 2008?
...

Let’s break it down: 1) the continued emergence of blended search results 2) more advances in disambiguating intent, and 3) the promise of mobile…such as…the iPhone.

That’s it?  That’s what the key major search engine insiders and industry analysts predict for the roller coaster year ahead?  More of the same - and the iPhone?


Now, (disclosure) I'm an occasional contributor to ASE and Charles is a personal friend of mine, so I grant that I'm biased; but I'm with Charles on this one. That's it? Those are the key changes to search predicted by the major search engines for the next year? Is it just me, or do all of these changes seem - evolutionary, not revolutionary?

In a recent article on Future Directions in Search, I highlighted the major areas for potential advances in search: Query specification, Base Index, Relevance Algorithm, Results Visualization and Ongoing Interest (Notification). In that article, I was looking at a much longer time horizon, but I expect that some discontinuous changes will occur in one or more of those areas within the next year.

Search is a highly dynamic field that presently generates a tremendous amount of interest among scientists, engineers and entrepreneurs. (Google's stratospheric market cap has ensured that!). There are so many search startups coming up, many of which are introducing new concepts and technological innovation, such as - Vertical Search: indeed, Spock, and many, many others; semantic search: hakia, powerset; dynamic results visualization: quintura; ways to add value: trulia, zillow; ways to speed up search: vortexDNA; and so on. At least through acquisition, if nothing else, the mainstream search engines should be able to move ahead quickly.

As a specific example, let's look at the Video search space. I recently discovered Mark Robertson's web site, ReelSEO, which is dedicated to SEO/SEM of video content. On his site, Mark hosts the Comprehensive list of video search engines and video sharing sites, which lists over 100+ sites dedicated to video sharing and search. With so many players, surely there's someone who will introduce a new concept or significant change in video search?

Finally, let us acknowledge the elephant in the room. What about - discontinuous improvements to the heart of the Search Engine, the PageRank algorithm? After all, reduced to essentials, PageRank is only an approximation of the authority of a web page or site, based on the value and authority of incoming links. It was certainly an amazing insight on the part of Google's founders, and worthy of the success it attained; but just because all the major search engines use it today does not make it the right way or the only way to identify relevant results.

Perhaps there are other approximations which may work as well or better? Examples of alternative algorithms include: swarm intelligence (like Ant Colony Optimization), human algorithms (e.g. people-powered engines for popular searches and breaking news), brand authority (hey, we use it for everything else in life!), social graph, and many others.

Regardless of what Gord's A-list panel says, there's one thing I'm sure of: 2008 will be an exciting year for Search!



November 29, 2007

The Semantic Web is becoming real - slowly

A couple of weeks ago, I attended an event from the SDForum in Palo Alto, featuring a series of project demos showcasing real applications built on the Semantic Web. While I was initially skeptical, I came away amazed at the social and semantic intelligence being built into the latest web applications.


Yahoo!

The most interesting demos came from Dr. Mor Naaman of Yahoo! - these projects were at once the most real and the least relevant to Semantic Web (at least, in its pure form).

TagMaps


Described as "a toolkit to visualize text (tags) geographically on a map", TagMaps allows the creation of applications that mashup text and geographical information (such as Flickr images) with Yahoo! Maps; Yahoo!'s sample application World Explorer is quite amazing. The most interesting thing about this application is that by combining the geo-tagging information about Flickr images with their corresponding tags and then displaying those tags on a map, the application accurately displays items of interest on the map - this is semantic information that has been extracted from the underlying raw data.

ZoneTag



Zonetag can automatically tag your photos with geographical information; in addition, it can suggest tags for the photo based on the location . This makes it easy to tag photos taken on a cell phone with both types of information.

FireEagle


FireEagle, currently in closed alpha testing, is billed as "a new way to share your location with friends or with other websites and services". The main idea is to create a new user location platform that any third-party can leverage to read and write the location of the user.


Radar Networks

Any set of Semantic Web demos would be incomplete without an entry from Radar Networks. Nova Spivack, CEO of Radar, presented a demo of their offering, twine [tagline: "using information as context"], which is basically a new social network to which Semantic Web concepts have been applied. twine, currently in closed beta, has been getting a lot of press recently as the first true Semantic Web application.

I have to admit, the demo was quite impressive. Mr. Spivack created a new "twine", assigning a series of web pages, articles and other web information to the twine, and the application extracted a whole range of meaning from the content - automatically assigning tags about topics, people, links, locations, even concepts. It was a cool thing to watch!

While this exercise clearly demonstrated that the underlying technology works, and works well - clearly, great things lie ahead for the Semantic Web - I was less than impressed by the actual application chosen by Radar Networks (maybe I just don't see it yet). Does the world really need another custom home page or social networking application, even one that harnesses the Semantic Web?


SRI

Adam Cheyer from SRI presented a demo of an experimental project named CALO. CALO, which stands for Cognitive Assistant Learning and Organizing, is a DARPA-funded project that gathers the user's context and supports dynamic decision-making. In effect, the "software assistant" watches everything you do to learn, so that it can eventually make intelligent suggestions, for example, act as a search assistant or suggest alternate knowledge users for a meeting. A parallel project, CALO Express, is a productized Windows version for commercial use.

An intelligent software assistant is a noble goal, but watching the slides, I wondered if it would get traction commercially - the idea of this virtual assistant watching everything I do was slightly creepy; it's probably a better fit for a more controlled world, such as a defense lab or that perennial Hollywood favorite, a "top-secret government project".


PARC

The folks from the legendary Xerox PARC demonstrated Magitti, a "mobile leisure guide". By implicitly collecting information about the user's behavior within their mobile device, the application learns about your interests within a given context; this is then used to guide the user by suggesting other activities by location, time of day and social peer behavior. Again, a good idea, perfect for today's Facebook-fed generation.


Semantic Web or Privacy: Pick one!

The demos were all very cool and worked flawlessly - it is amazing how much meaning can be gleaned by an application by combining data about geography, time, context and peer groups. At the same time, it requires participants to willingly share information in order to avail of the benefits of semantic processing. Is it a good trade-off, one that users are willing to accept? That remains to be seen. As the early commercial applications of Semantic Web become widespread and more easily available, the answer is likely to become increasingly obvious.



July 29, 2007

Three Bloggers Sound Off: What is a Search Engine?

Over the past couple of weeks, I've been participating in a fascinating email discussion about Web Search with two of the leading bloggers in this space: Charles Knight, editor of the popular Alt Search Engines blog (a member of the Read/WriteWeb blog network) and Kaila Colbin of the VortexDNA blog. We quickly realized that a starting point for any discussion about search engines was to first come up with an answer to the question: What is a Search Engine?

What we found was that although we agree broadly, we have different views about the specifics; the overall debate provides a great framework for any discussion about web search. Charles is planning to run our three posts as a series on the Alt Search Engines blog this week:

  1. First, a rant from me on "What is a Search Engine"?
  2. Next, a great piece by Kaila on "What is Not a Search Engine?"
  3. Finally, Charles writes a definitive piece about "What is an Alternative Search Engine?"

I'll put up a link on this blog once the posts are up. Charles and Kaila are amazing bloggers, so it should be well worth checking out!



Survey: The Future of Web Search

In this latest poll on the Software Abstractions blog, we would like to ask readers about the Future of Search .

Specifically, which features do you see as the most important ones for Web Search in the future? Vote now and let us know!

-------------------------------

Update: This survey is now closed. You can view the results of the survey here.


July 01, 2007

Looking for Googhoo!

My friend Ashkan Karbasfrooshan recently wrote a fascinating article wondering whether Google should buy Yahoo!. In an email, he wondered about my position on this topic, so I thought I would take a whack at it, speaking as an ordinary investor. [On the other hand, if you're serious about investing, you should ditch my article and read Jeff Matthews' series of long articles about his pilgrimage to Omaha to observe the world's greatest investor - yes, he (Jeff) isn't making it up!]

The Upside

For what it's worth, I think there are good reasons why Google should consider this idea seriously:

1. Stock price: In my opinion, Google is overvalued and Yahoo! is undervalued at this point - making it the perfect time to buy market share (and traffic) at a low price.

 

 

2. Traffic: Yahoo! still outdoes Google in pure traffic. And the quality of the traffic is intrinsically different. The predominant portion of Google's traffic comes from its flagship search engine site. Is this traffic defensible? Since the cost of switching is trivial for individual users, there's no lock-in and little loyalty. Remember Alta Vista? How long before someone comes up with a better mousetrap [i.e. search algorithm]?

At the same time, Yahoo! has really broad, deep content, which makes it far more defensible IMHO. Content really is king!

 

 

3. Mind share: Google and Yahoo! are still neck-and-neck in terms of user mind share. Microsoft is a somewhat distant third, however the folks from Redmond are far from out of the race. A combined G/Y! company that offers powerful algorithms (Google search), rich applications (Google Apps) and wide-ranging quality content (Yahoo! web properties) would present Microsoft with a formidable competitor indeed!

 

 

The Downside

1. Culture: Looking from the outside, I imagine that the cultures of these two companies are quite different!
2. Anti-trust concerns: Will the DOJ cause hang-ups on issues of search monopoly?
3. Financials: Will the numbers work? I have no idea - I'm no corporate deal-maker. (Ashkan's post throws some light on this issue.)

A rant about Google's search

Is Google peaking in Search? A couple of search events this week made me consider this possibility.

First, Google's Marissa Mayer gave a wide-ranging talk about the Future of Search, highlighting eight initiatives that Google is focusing on to improve search. All of these search improvements seem to be incremental - honestly, which of these eight are truly revolutionary?

Second, I had the opportunity to attend the blogger/tech-media event at much-hyped search startup Powerset. By implementing natural-language processing techniques, Powerset has the potential to improve the relevance of search results radically by matching the semantics of the content in its index with that of the query. This is a great example of an approach that, if effective, will certainly offer more than incremental improvement!

It remains to be seen if a Google-killer will emerge soon in the world of search; if history is any guide, we should see long periods of incremental development, punctuated by sudden, unexpected, discontinuous changes.

Conclusion:

I for one, would welcome our new search overlords from Googhoo!



[Disclosure: I have no positions in either Yahoo! or Google.]


  • Search This Blog


    Web This Blog