May 15, 2008

Yahoo! SearchMonkey - Released to Developers

The good folks from Yahoo! unveiled their new open search platform Yahoo! SearchMonkey, at a developer launch party today at their Sunnyvale headquarters. In some ways, the SearchMonkey platform is revolutionary and a major step forward in search, allowing publishers to participate directly in improving the quality of their own information presented on the Yahoo! search results page (this is also implicitly a push for the bottom-up approach to the Semantic Web, which most industry observers have given up on in favor of a top-down approach). The platform also lets publishers and third-party developers build applications aimed at improving the search experience. Finally, and most important, if enough publishers and app developers participate in the program, it promises to improve the quality of search results for end users.

Features

At the simplest level, you can think of SearchMonkey as a community-powered set of rich information boxes (similar to the Google OneBox) that appear on the Yahoo! search results page. Publishers can provide this rich data to the Yahoo! search index in a variety of ways: through structured data feeds (RSS), through RDF or Microformat markup on web pages, or through simple page extraction. The "Information Bar" shows up underneath the main search results. The Yahoo! search team has also provided tools to enable developers to build search-based applications very simply and easily.

Continue reading "Yahoo! SearchMonkey - Released to Developers" »

May 11, 2008

Powerset Launches Wikipedia Search

Semantic search engine Powerset, which we've written about here before, has just launched its initial release. The current release is limited to indexing Wikipedia content, but it provides a great showcase for their technology and user experience.

For example, my search for "Alexander the Great" provided the following results page:

Continue reading "Powerset Launches Wikipedia Search" »

May 07, 2008

Cognition Technologies recognized by KMWorld as one of "100 that matter"

Cognition Technologies, which focuses on Semantic natural language processing technology, was named by KMWorld as one of the top 100 Companies That Matter in Knowledge Management for 2008.

Says Cognition CEO Scott Jarus:

One of the biggest barriers to building a natural language understanding system is to build the semantic map and the dictionary with details of the syntactic behavior of words (i.e. how words behave within context).  Cognition's team has spent more than 20 years building this capability into Cognition’s Semantic NLP for the English language ...  and our technology is commercially available today!

Semantic search and NLP technologies seem to have arrived - they are generating a lot of buzz lately. In addition to mainstays Hakia and Powerset, there is a spate of new entries, including Cognition, BooRah and eeggi. We will be reviewing some of these new alternate search engines on this blog in the near future.

Congratulations, Scott and the Cognition team!



April 29, 2008

Thoughts about Alternative Search Engines Day 2008

I was at the Alternative Search Engines Day event in San Francisco last week. Organized by Charles Knight of the Alt Search Engines blog (and friends), it brought together key people from over 40 alternative search engines. It was an amazing crowd, full of interesting and bright people, and the overall energy was incredible!

At the keynote, Charles gave a pitch for bringing ASEs together that was very well received. He showed us some examples of what a unified User Interface that combined multiple search engines would look like. I contributed a tiny bit (expanding on the idea that complementary ASEs could band together to provide Federated Searches for enhanced traffic and usability, and listing a few ways for the Alts to cooperate even while competing ).

Continue reading "Thoughts about Alternative Search Engines Day 2008" »

February 17, 2008

Social Data: Observations from "Search & The Social Graph" Event

Dave McClure moderated an event on Search & The Social Graph at the Yahoo! campus this week, organized by the Search SIG of the Software Development Forum. With the meteoric rise of Facebook and the heightened interest in leveraging the social graph - both Google and Yahoo! have launched new APIs and OpenSocial is gaining momentum - this discussion was timely and attendance was strong.

The panelists represented some of the most interesting players in this space:

  • Kevin Marks from Google
  • Aditya Agarwal from Facebook
  • Kent Brewster of Yahoo!
  • Eve Phillips, CEO of Chirp

It turned out to be an interesting event, with lots of good discussion about the implications of portability, privacy, utility and monetization of social data. No stranger to the social data space, moderator McClure did an outstanding job of keeping things focused and the discussion lively; he was clearly  knowledgeable and well-prepared, launching into a series of leading questions that moved the conversation forward.

Key Observations

By grouping together related comments, I've distilled the discussion at this event into the following topics:

1. Relevance of Search Results

- With the explosion of self-publishing and user-generated content on the web, the type of data getting created on the web is changing, and the classic search algorithms are becoming less effective.
- Users are increasingly interested in what their friends and peers are doing online.
- By using a social graph to filter out results during a specific search, you can boost the relevance of search results.

2. Monetization

- It is no longer uncommon for a person to become a media source, using tools such as twitter, blogs and RSS feeds; but this is hard to monetize. A referral model works better in this case than advertising.
- Brand advertising is still big, even for social search, but it works differently than for targeted search
- Online brand advertising will move into more interactive experiences in the future
- The key question is: Does membership in a social group signal an intention that can be targeted by advertisers? The panelists felt that, on balance, it did Not
- For a more concrete example: Google's directed search is very monetizable; Facebook has a lot of social data, but user behavior is not very monetizable

3. Privacy

- There is a clear difference between a publicly-proclaimed graph, such as the friends on Facebook, and a private list, such as Email contacts; application developers will ignore this distinction at their peril
- Yahoo!'s Brewster said it best: "There should never be a privacy surprise for the user!"
- Applications should make it clear to users if they are making data public or private; e.g. Flickr is three-valued in this regard

4. Interaction Levels

- From a monetization perspective, all "friends" are not created equal; some connections in the social graph are stronger than others
- The smallest inner set of friends is the most valuable; the first 25 people have 80% of the value
- The viral rate of promotion in Facebook is incredible
- If users can annotate connections, they can more fully express their network graph
- You can infer relationships from user behavior, such as sites visited and click-throughs
- The most important part of social data is the connections, followed by the profile; eventually, it gives you the ability to answer the question: "Who should you go to, to answer this question?"

5. OpenSocial

- OpenSocial allows application developers to write one application, and then take it to where the users are on diverse other social networks
- The vision: take some of the good parts of Facebook and bring those to a lot of people
- This allows any application to spread through the social graph

6. Social Email

- Email networks have a lot of connection data, which has social data buried in it
- These connections can either be one-way or two-way; the difference signals intent on the part of the user
- Google's Marks made an interesting point: a person's email address and personal URL are opposites - with the former, you can communicate with that that person; with the latter, the person communicates with you

Facebook

Facebook's Agarwal did a great job of articulating the company's approach to some of these issues. His contributions to the discussion were somewhat Facebook-centric; but given the strong community interest in Facebook lately, this only added to the value of the panel.

In discussing the value of social data for search, Agarwal compared the issues of selecting for relevance among a large number of results for a targeted search, with those of producing Facebook's news feed, which must also present a large amount of data to the user in a format that's easy to consume.

In terms of privacy, Facebook wants to allow users to annotate the social graph, so that they can fully express their network. This will allow users to separate their strong connections from casual friends. The size of a user's graph is another dimension to be considered.

For data portability, Facebook currently doesn't have any plans to implement enabling features focusing on it. Agarwal clarified that although philosophically they support data portability initiatives, they have not determined it to be the best use of resources at this time.

Finally, although Agarwal did not acknowledge this directly, the panelists agreed that the Facebook-type social network data and searches are far less monetizable than directly targeted activities that display clear intent, such as a Google search.

Chirp

This was the first time I saw a demo of Chirp . Eve Phillips, Chirp's CEO, gave a demo of chirpscreen, an interactive screen saver that displays content from your social network, such as pictures from Flickr and status messages from Facebook. On the whole, the audience loved it - a series of photos of her friends kept popping up on the screen - but there were some concerns about being able to control what gets shown. According to Phillips, Chirp is planning to introduce new features soon that will allow users to set preferences of what content is displayed, from which sources, and so on.

Open Questions

McClure asked some incisive questions to the panelists, which deserve to be listed in their own right; I hope these lead to a wider discussion about social data and related topics:

  • Is Social Search - revolutionary, or evolutionary?
  • Which benefits more from social data: targeted search or discovery?
  • How well does social search monetize?
  • How should we use the social data that's automatically present in Email?
  • If Facebook and other networks encourage lightweight friendships, does it obscure the real social graph?


February 07, 2008

WebGuild Web 2.0 Conference: Designing Search Engine Friendly Sites

SEO is one of the hottest topics currently in the world of web sites and web applications, since a high ranking in search engine results can have a tremendous impact on the amount of traffic a site receives. So it was no surprise that the session on Designing Search Engine Friendly Sites was so popular, at the WebGuild Web 2.0 Conference and Expo last week.

As a co-founder of Search Engine Marketing firm Bloofusion, moderator Andreas Mueller is no stranger to the topic of SEO; in addition to asking incisive questions, he was able to add to the discussion with the other panel members.


     


The other members of the panel were:

Near the start of the session, Paul O'Brien outlined the most basic three items required for Findability - changes you should complete before even attempting any explicit SEO tactics:

  1. Access: Crawlers from the major search engines have to be able to access your site and get at the content
  2. Structure: You must organize the content on your site so that Google (or other search engine) can understand it
  3. Content: The content must follow the basic requirements of SEO, such as optimizing keywords, using adwords and so on

SEO Tips

Based on the discussion at this session, I've compiled the following list of SEO tips provided by this panel of expert Marketers.

  • Optimize the content that people are searching on, not search terms that you would like to rank for even if no one is searching for those terms
  • Try to articulate explicitly what the expected outcome is - which terms would you like to rank highly for? Which specific page should rank high for which term?
  • Think about SEO early in the web design process and throughout the life-cycle of the product or site; adding it in as an after-thought is less effective and takes a lot more resources

     


  • Within a company, it is better for the SEO function to live within the Marketing department, rather than within Engineering. At the same time, you need outside validation that the company is going in the right direction.
  • Create a hierarchy of web pages, optimized for both human users and search engine crawlers.
  • One word: Linkbait! Create content that's unique, valuable, and most important, consistent with your brand. Optimize it for keywords that are important within your domain.
  • For SEO purposes, avoid dynamic web sites that rely too heavily on Ajax or Flash; if it can't be crawled, it won't rank highly with the Search engines.
  • Creating a static site that can be crawled, separate from the main dynamic web site, has the effect of diluting PageRank for those web pages.

At one point, Mueller asked a really interesting question: given that resources are finite and constrained, should you focus resources more strongly on inbound links, or on optimizing the content?

The panelists agreed that since link popularity is weighted much more strongly, focusing on getting inbound links is a top priority; optimizing the content by adding keywords in links, using meta tags, etc. remains a distant second.



February 03, 2008

WebGuild Web 2.0 Conference: Issues and Challenges for Crowdsourcing

I attended a really interesting session at the WebGuild Web 2.0 Conference and Expo last week: The Power of Crowdsourcing - moderated by Jeremiah Owyang of Forrester Research.

Participants on the panel were:

This was one of the best panel sessions I attended at the conference, part of the reason being the bang-up job Owyang did as moderator. He took a very active role, bringing up provocative questions, directing those at specific members of the panel and not being shy about treading into the concerns and difficulties of using crowdsourcing and social media - this prevented the session from degenerating into a "Rah, Rah, Crowdsourcing is all good!" type of discussion.

The other reason was that the panel members were all knowledgeable, articulate and open in their remarks; the conversation never flagged, as it did with some of the other sessions I attended.

To kick off the session, Owyang put up a few slides entitled Social Technographics that were intriguing, but more on that in a future post.



One of the panelists, Michael Sikorsky of Cambrian House, listed the three legs of crowdsourcing as follows:

  • Wisdom, which can be explicit (e.g. voting in American Idol) or implicit (e.g. links used for calculating PageRank)
  • Participation, such as item submissions or code check-ins
  • Funding, such as a prize or project funding

The Challenges

Based on the discussion at this session, I've compiled the following list of challenges in implementing crowdsourcing solutions and ways of addressing them.

Wisdom of Crowds: How do you keep the input quality high?

For any crowdsourcing activity, the first step is to pick the right crowd! Equally important, you must ask the right question.

The next step is to use statistical methods to prioritize high-quality input. Finally, a self-policing community (possibly, with some moderation) can help weed out low-quality input and spam.

Is there an inverse relationship between those who have the time to contribute, and the quality of the ideas presented?

One caveat to keep in mind is that the vocal minority may not be representative of the majority of users. But this type of forum may act as a funnel for identifying talented people who have not yet been discovered.

By providing rewards or incentives consistent with the value of the ideas being submitted, you can get greater participation from qualified users and a higher level of confidence in the quality of the ideas being submitted. Another alternative is to use some type of game mechanism; games have built-in rewards that encourage participation.

What if some people don't want to be outsourced?

Tara Hunt, of Citizen Agency, recently wrote a blog post titled: Please Stop Crowdsourcing Me , questioning whether crowdsourcing is a good idea. She has a point - some users may not want to contribute or be involved in a crowdsourcing exercise, especially to benefit a large corporation.

The panelists agreed, and pointed out that you should carefully consider which tasks should be outsourced in this way - for example, product users love to help each other out with solving problems and difficulties, but if participants get the feeling that the company is simply using them to reduce customer service costs, then they will stop being helpful.

Any crowdsourcing program has to be thought through and managed carefully; you don't want to risk users having a bad experience.

How do you manage and lead a crowd, to create a positive experience?

For the community to be truly engaged, it is extremely important for the company to be very transparent.

One key point to think about, especially for large companies, is that you have to be careful about what you share with the crowd. On the one hand, the more you share, the better the ideas you will get; on the other hand, you risk letting out corporate proprietary secrets.

Finally, some activities simply may not be amenable to crowdsourcing.

How much control do you want to retain? Do you need a Product Manager as an expert?

A community of users can generate a lot of great ideas, but those don't all necessarily fit together; having an expert in place as a product manager can provide guard rails to keep things on track. The product manager can bring a single, unified vision and - this is critical - can communicate back to the community why a particular idea is not being used.

It's important to find a balance: the community generates the ideas, but the company or organization picks the ones to be used, refines them and implements them. Even the nuggets of ideas can be leveraged to create lots of value.


 


Successful Examples

The panelists also offered examples of actual crowdsourcing implementations:

  • The Longitude Prize - one of the earliest examples, was a reward offered by the British government through an Act of Parliament in 1714 for a simple and practical method for the precise determination of a ship's longitude.
  • Procter & Gamble has raised the level of outside design and significantly increased the success of product-related improvements.
  • Intelpedia from Intel, is an example of crowdsourcing in the Enterprise space. The idea is to look internally for ideas, share best practices and preserve common knowledge. According to reports, Intelpedia has up to 20K pages already.
  • Ace Hardware created a community for 300 dealers, whose ROI was measured at 500%; as a result, the community was rolled out to all 5000 of its dealers.
  • The Hopelab Foundation created a global competition for kids of all ages and received submissions from 429 teams.
  • American Idol has produced highly successful artists, some of whom have sold over 10M CDs; even the worst idol winner has sold 500K CDs.
  • Innocentive is a well-known example where companies post complex problems and offer rewards for their solutions.

[Update :  An alert reader pointed out that the P&G name should be spelled with an "&" - this is now fixed. Thanks, John!]


January 27, 2008

Quintura Launches Site Search Widget

Alternative search engine Quintura, which I've mentioned before on this blog, has launched its site search widget. This widget allows site publishers to provide users with a specialized search limited to that specific site; it joins earlier offerings from Google, Yahoo!, Rollyo and Eurekster swiki in this space.

This blog was an early user of this widget. You can see a customized, Quintura-generated mini-tag cloud in the earlier post; a full-size tag cloud is also available. The widget is hosted by Quintura, so installation was a snap: once the site was indexed, all I had to do was to embed the widget code into my blog pages and provide some styling control.

The biggest benefit of using the Quintura solution, as I've said before, is the dynamic tag cloud that allows the user to navigate the search space; initial feedback from our readers here has been positive, but not enthusiastic.

The real benefits to both users and publishers will come when Quintura search results prove to be better than equivalent results from a mainstream search engine solution, such as Google; as long as the Google site search results are good enough, it will be hard for the Quintura widget to make significant inroads into the market share of the big-G juggernaut.

This widget release is currently in private beta; an invite for this beta is available over on ReadWriteWeb.



January 21, 2008

SPARQL: Query Language for the Semantic Web

The W3C has announced the publication of SPARQL , a language for querying distributed data on the web. Similar to the way SQL is a generic language used to query relational databases regardless of vendor, SPARQL will allow users and applications to create queries that express high-level goals across many different data sources, regardless of the database technology or data format involved.

From the W3C press release:

"Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL," explained Tim Berners-Lee, W3C Director. "SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the Web."

The combination of the SPARQL query language and protocol creates a Web service in its purest sense; running on top of HTTP or SOAP, it provides a standard Web service for anything which asks a question.

"SPARQL's focus on querying the data models saves time for developers; there's no need for a host of little Web services to retrieve different aspects of the state of a system," explained Lee Feigenbaum, Chair of the RDF Data Access Working Group. "This allows the user of the SPARQL endpoint to ask any question -- it is as though they could design their own interface instead of having to work with a limited set of fixed services."

The press release goes on to say that the SPARQL specification defines both a query language and a protocol, and works well with other Semantic Web technologies from the W3C: RDF, RDF Schema, OWL and GRDDL.

InfoWorld has a great article explaining this development in more detail [via Dave Cobley at Altiss ]:

Already available in 14 known implementations, SPARQL is designed to be used at the scale of the Web to allow queries over distributed data sources independent of format. It also can be used for mashing up Web 2.0 data.


I see this as a very positive development for the Semantic Web field in general. At its core, the operation of the Semantic Web is composed of the following basic functions:

  • Creating content with meaning (either implicit, like XML, or explicit, like Tags)
  • Understanding or extracting the information from a block of content
  • Classifying the blocks of content (into a hierarchy, taxonomy or folksonomy)
  • Presenting the information in a variety of forms (web, mobile, web services API, mashups, embedded devices and so on)
  • Finding the information of interest; this information may have to be derived from the content provided

The rise of easy-to-use self-publishing tools has led to an explosion in the amount of content available on the Web, and being able to find the answer to a question from this mountain of information is vital.

But first users have to be able to express what they are looking for, in a meaningful way. It is this need that is being addressed by SPARQL, which allows users to formulate intelligent queries. These queries can then be used by agents and applications on our behalf to find us the information we need.



November 29, 2007

The Semantic Web is becoming real - slowly

A couple of weeks ago, I attended an event from the SDForum in Palo Alto, featuring a series of project demos showcasing real applications built on the Semantic Web. While I was initially skeptical, I came away amazed at the social and semantic intelligence being built into the latest web applications.


Yahoo!

The most interesting demos came from Dr. Mor Naaman of Yahoo! - these projects were at once the most real and the least relevant to Semantic Web (at least, in its pure form).

TagMaps


Described as "a toolkit to visualize text (tags) geographically on a map", TagMaps allows the creation of applications that mashup text and geographical information (such as Flickr images) with Yahoo! Maps; Yahoo!'s sample application World Explorer is quite amazing. The most interesting thing about this application is that by combining the geo-tagging information about Flickr images with their corresponding tags and then displaying those tags on a map, the application accurately displays items of interest on the map - this is semantic information that has been extracted from the underlying raw data.

ZoneTag



Zonetag can automatically tag your photos with geographical information; in addition, it can suggest tags for the photo based on the location . This makes it easy to tag photos taken on a cell phone with both types of information.

FireEagle


FireEagle, currently in closed alpha testing, is billed as "a new way to share your location with friends or with other websites and services". The main idea is to create a new user location platform that any third-party can leverage to read and write the location of the user.


Radar Networks

Any set of Semantic Web demos would be incomplete without an entry from Radar Networks. Nova Spivack, CEO of Radar, presented a demo of their offering, twine [tagline: "using information as context"], which is basically a new social network to which Semantic Web concepts have been applied. twine, currently in closed beta, has been getting a lot of press recently as the first true Semantic Web application.

I have to admit, the demo was quite impressive. Mr. Spivack created a new "twine", assigning a series of web pages, articles and other web information to the twine, and the application extracted a whole range of meaning from the content - automatically assigning tags about topics, people, links, locations, even concepts. It was a cool thing to watch!

While this exercise clearly demonstrated that the underlying technology works, and works well - clearly, great things lie ahead for the Semantic Web - I was less than impressed by the actual application chosen by Radar Networks (maybe I just don't see it yet). Does the world really need another custom home page or social networking application, even one that harnesses the Semantic Web?


SRI

Adam Cheyer from SRI presented a demo of an experimental project named CALO. CALO, which stands for Cognitive Assistant Learning and Organizing, is a DARPA-funded project that gathers the user's context and supports dynamic decision-making. In effect, the "software assistant" watches everything you do to learn, so that it can eventually make intelligent suggestions, for example, act as a search assistant or suggest alternate knowledge users for a meeting. A parallel project, CALO Express, is a productized Windows version for commercial use.

An intelligent software assistant is a noble goal, but watching the slides, I wondered if it would get traction commercially - the idea of this virtual assistant watching everything I do was slightly creepy; it's probably a better fit for a more controlled world, such as a defense lab or that perennial Hollywood favorite, a "top-secret government project".


PARC

The folks from the legendary Xerox PARC demonstrated Magitti, a "mobile leisure guide". By implicitly collecting information about the user's behavior within their mobile device, the application learns about your interests within a given context; this is then used to guide the user by suggesting other activities by location, time of day and social peer behavior. Again, a good idea, perfect for today's Facebook-fed generation.


Semantic Web or Privacy: Pick one!

The demos were all very cool and worked flawlessly - it is amazing how much meaning can be gleaned by an application by combining data about geography, time, context and peer groups. At the same time, it requires participants to willingly share information in order to avail of the benefits of semantic processing. Is it a good trade-off, one that users are willing to accept? That remains to be seen. As the early commercial applications of Semantic Web become widespread and more easily available, the answer is likely to become increasingly obvious.



  • Search This Blog


    Web This Blog