Beyond PageRank: Using the Wisdom of Crowds in Enterprise Search
The classic problem in Internet Search is: how to display the most relevant results in response to a search query. One would think this problem would be relatively simple within the constraints of an Enterprise Search, since these searches are based on a much smaller overall scope of content and there is little incentive to game the system.
But this is not true, for two reasons. First, as the social principles of Enterprise 2.0 take off, the sheer amount of content is exploding, even within an organization. Second, the concept that relevance is relative is not limited only to internet search engines; this problem carries over to Enterprise Search - what's relevant to one user may be completely useless to another user.
This issue of relative relevance is one that static solutions, like the PageRank algorithm used by the Google Search Appliance, fail to take into account. Just because a given piece of content is wildly popular does not make it relevant for all users. [For a more general example, try searching for the term "apple" in the Google internet search engine - all of the hits on the first page are related to Apple, the computer company; how about apple the fruit?]. Arguably, personalization could help bring more relevant results, but that presents its own privacy-related problems.
How then does one improve the relevance of search results? One method is to use the Wisdom of Crowds: what do other users like, especially the ones whose tastes are similar to mine? Where do they spend their time? What content links do they click on?
I recently had the chance to review a solution in this space: Baynote, a 2-year-old startup based in Cupertino, CA, that addresses this exact problem.

Baynote's approach
How does the Baynote solution work? Essentially, the idea is to find a tag-based "user fingerprint" to associate the user with a like-minded peer group; by creating micro-segments of the community, you can then use WoC to make recommendations to the user. Of course, every user is different, but over a large population, the noise tends to cancel out and the key signal - the core set of interests of the group - is preserved.
Baynote's Affinity Engine tracks about 20 different heuristics of user behavior, such as mouse movement, time spent on the page, number of repeat visits, and so on. By observing this behavior, it promotes the emergence of collaborative knowledge, surfacing information about key content, user likes and dislikes, micro-segments of the user population, and finally, recommendations. Used for Enterprise search, the engine is designed to boost the relevance of search results over time, based on user behavior. Baynote also provides solutions in the E-commerce space (where good recommendations can increase the conversion rate) and for Help systems (where navigation to relevant popular results can improve customer satisfaction).
I saw a demo of Baynote-powered search results. Compared to the "raw results" obtained without using Baynote, the increase in relevance was indeed impressive. You can try out a live example for yourself, by doing a search within the Interwoven web site where Baynote is implemented. A sample screen shot is shown below.

Some of the highlights and concerns with this solution, from my perspective, are noted below.
Highlights:
- Tagging: Automatically builds a virtual folksonomy that grows with time
- Classification: Assigns users to peer groups; this is an implicit classification based upon user behavior
- Seasonality: The algorithm is based on the concept of UseRank rather than PageRank; since the UseRank rises and falls depending upon user interaction, relevance automatically adjusts to the fading of importance for a given piece of content
- Independence: Users have no direct influence on each other
- Hard to game: Since interactions are measured implicitly, it's difficult for a single user to game the results; at the same time, no specific action is required on the part of the user
- Rich content: Since ranking is implemented through the use of tags, it works equally well with binary content, such as images, audio and video
- Easy implementation: According to Baynote, no server-side integration is required; implementation involves the addition of an "observer" script added to the template of the web site, similar to a web analytics tracking bug.
Concerns
- Positive feedback: Once a piece of content becomes popular, it gets increasingly displayed to users, which is likely to further increase its popularity. Since display priority is based on a combination of relevance and popularity, there is a danger that less-relevant results will be ranked higher, simply because they are more popular. [Although, with time, their popularity should automatically adjust lower for that particular micro-community.]
- Edge cases: How does one look for obscure information that has low popularity? It's possible that obscure content could be completely overshadowed by highly popular content with similar tags.
- New content: This is directly related to the positive feedback problem mentioned above. How does new content get traction initially? Jack Jia, CEO of Baynote, pointed out that the engine supports a feature to address exactly this problem: a "merchandising" feature that hard-codes a piece of content to specific query results, such that it cannot be overridden by organic results. This allows the content to gain initial popularity.
Conclusion
Baynote is one of the more interesting Search engine implementations I have come across. The addition of WoC knowledge, gained by observing implicit user behavior, is certain to improve the relevance of search results and help web site managers discover (and act upon) non-intuitive connections between content areas. The case is even more compelling for E-commerce sites, where the display of related and popular products could have a direct impact on the bottom line.
Comments