January 20, 2007

Search and the Dumbness of Crowds

The "Dumbness of Crowds"

Kathy Sierra recently had a fascinating post in the Creating Passionate Users blog: The "Dumbness of Crowds"  , in which she carefully analyzes the popular notion of The Wisdom of Crowds . Given the technology community's current Web 2.0-startup craze, with its heavy reliance on the concepts of community and crowd-sourcing, this is a very relevant and timely discussion. In her post, Kathy makes a distinction between two related scenarios: on the one hand, aggregating knowledge from a collection of individuals working independently (wisdom), and on the other, a group of people acting together, such as the behavior of a crowd or the consensus decision of a committee (dumbness).

[For more about harnessing collective wisdom, check out these great articles by Tim O'Reilly, Dion Hinchcliffe, Nick Carr and Chris Saad.]


The pitfalls of "Collective Wisdom"

Clearly, there are specific constraints that need to be satisfied in order to ensure that the aggregate results of the crowd actually result in collective wisdom. These constraints can be catalogued and analyzed (see the "Failures of Crowd Intelligence" section in this Wikipedia article ); violating any of them will severely degrade the quality of the results.

These pitfalls also affect Web 2.0 communities. In order for the aggregated information to represent "collective wisdom", the community must avoid the following common scenarios:

  • "Everyone agrees"

Does the community provide enough diversity in viewpoints? The value of the outliers cannot be overstated. If the size of the community is simply too small, or the community is too homogeneous, then everyone drinks the same kool-aid and there is not enough disagreement.

  • "Ms. Expert says so!"

If strong players within the community have the ability to influence others' votes, then the positions taken by participants are no longer independent.

  • "Gaming"

Is the voting fair - does it reduce (or prevent) malicious votes? If participants have the incentive and the ability to subvert the results for their own personal gain, then the collective solution is not going to be very meaningful.

  • "The rich get richer!"

Are there network effects that affect the outcome? Quite often, the process starts democratically enough, but once a single solution or viewpoint starts to get relative traction, positive feedback pushes it to overwhelming adoption. In other words, the system exhibits unstable equilibrium.

  • "Lack of Participation"

Are users actively participating? This is somewhat different from the earlier point about community size; as a practical matter, for the system to work, it must be in the voter's self-interest to vote and vote fairly, leading to the best results.

  • "Voting Format"

Is the voting format implicit or explicit? An explicit system requires more work from participants and is also more susceptible to spam or gaming.


Popular Web 2.0 Communities: Search Wisdom or Dumbness?

With these constraints in mind, let us evaluate some of the popular Web 2.0 text search/information findability solutions based on "crowd-sourcing", to see which applies better: Collective Wisdom or Collective Dumbness?

[Note: The solutions examined here are among the leading lights in their respective genres; most of the discussion applies to other similar engines in each space]

Google:

Google has an incredibly efficient algorithm for harnessing Distributed Collective Intelligence from the global community to improve findability (aka search); this is one of the most successful implementations of this concept.

Properties:
- Data Collection: Implicit
- Summary: Google's approach can be summarized simplistically as: "On any topic, the information that most people refer to is the most important, and is what everyone wants to find"
- Approach: Uses static links as a proxy for user votes
- Gaming: Susceptible to spamming and SEO, with no community check on gaming (by design); thus there is a strong incentive to vote unfairly for marketing advantage
- Targeting: Heavily targeted for undue influence, due to the strong financial motivation for participants involved
- Network Effects: Very strong; voting starts off democratically for any new topic, but once these effects kick in, it is really hard for new entrants to gain traction, regardless of their quality

Conclusion:
Wisdom
, but converging towards big-budget marketing output; outliers are progressively less likely to see the light of day

Wikipedia:

Wikipedia is also one of the most successful implementations for capturing collective intelligence; the approach itself is not very efficient, since it depends on manual edits, but the collective efforts of a large community largely overcome this limitation.

Properties:
- Data Collection: Explicit
- Simple Summary: "Anyone can edit the information, so that solutions revert to the mean, which is accuracy (since different users make different mistakes)"
- Approach: Relies on direct edits to represent voting and volunteer editors for direct control of content
- Gaming / Network Effects: Much less susceptible to spamming and network effects
- But the process is not democratic enough, since editors exercise significant authority; it's hard for outliers to make their way in
- Targeting: Heavily targeted for undue influence
- Strong incentive to vote correctly "for the good of all"

Conclusion:
Wisdom
, but slowly losing its heavily populist, idealized underpinnings

Del.icio.us:

Another highly efficient algorithm for capturing Distributed Collective Intelligence; this is arguably the most successful player in the crowded online bookmarking space.

Properties:
- Data Collection: Implicit
- Simple Summary: "Everyone tags and stores their own links, and everyone benefits from the aggregate knowledge that can be extracted"
- Approach: Relies on users implicitly contributing to the creation of a taxonomy and categorization of content
- Gaming / Targeting: Not very susceptible to spamming, nor heavily targeted (except through good copywriting!)
- Network Effects: Some network effects. But the tagging process is completely democratic, and outliers can easily make their way in
- Users vote for their own self-interest, and votes are likely to be very fair, although individual accuracy may vary widely

Conclusion:
Wisdom
, the true "Wisdom of Crowds"

Technorati Search:

Properties for Technorati are very similar to those for del.icio.us, except that users search for blog posts rather than web pages

Properties:
- Data Collection: Implicit
- Simple Summary: "Tags from blog posts bubble up, and are grouped together to form a folksonomy"
- Approach: Relies on users implicitly contributing to content categorization
- Gaming/Targeting: Not particularly susceptible to spamming or manipulation, nor heavily targeted
- Network Effects: There are some network effects, especially the "echo-chamber" effect
- In general, users vote in their own self-interest and votes are reasonably fair

Conclusion:
Wisdom

Digg:

Digg uses an interesting approach to find articles/web pages of interest. Its algorithm is based on aggregating the active voting patterns of users for harnessing collective intelligence.

Properties:
- Data Collection: Explicit
- Simple Summary: "Everyone votes on whether a given article is interesting"
- Approach: Relies on users to submit articles and mark them positively or negatively; results are rolled up to find the most interesting articles
- Gaming: Very susceptible to spamming/gaming (there have been many articles written about it ); collaborative voting and the reputed "gangs of diggers" undermine the independence of votes
- Targeting: Heavily targeted for undue influence
- Network Effects: Very strong network effects, based on both article and author
- A recent change to the algorithm subverted the democratic principle of "one user, one vote"

Conclusion:
Rapidly descending into madness Dumbness?



Related Reading



Most kinds of people search engines are more powerful than regular search engines when trying to find people online; it's for that reason that many choose to search for people using a website that is a dedicated people search engine instead.



January 07, 2007

A conversation with John T. Maloney of Colabria

I recently had the opportunity to discuss Prediction Markets with John T. Maloney of Colabria. The KM and Colabria Clusters® are open, federated action/research networks - you can find more about them here  .

[Background about Prediction Markets: Wikipedia reference  , confab.yahoo event on PMs  , Chris Masse vortal]

Here are some excerpts from our conversation:

Q - Let me start with a basic question. Why should companies be interested in Prediction Markets? What's in it for them?

JTM - Better decision making, improved knowledge management, dramatic cost savings via far better forecasting.

Q - So let's say that I'm convinced that my company should set up some PMs to improve prediction accuracy and tease out common (but hidden) knowledge - how do I convince higher-level management of that?

JTM - Well, you need to make a fairly routine business case. This is the heart of the issue, and the main reason for crafting an industry consortium, the PM Cluster  . So far, PMs have been interesting research tools for scholars, academics and corporate research scientists. These particular populations are not equipped to make compelling business cases to management, to create 'best practices', hence the consortium.

Q - How do you quantify the benefits of the Prediction Market?

JTM - I do not assign these sort of qualitative measures, rather, it is just part of the toolkit. Tools are heavily dependent on context. What is accurate in one setting may collapse in another. This has been the problem since time immemorial -- the flawed focus on tools rather than context. The tricky part is finding the right 'contract' or future. The Web based tools are simple, easy. You need to 'find the pain' like anything else.
Examples are Microsoft and their rather bogus release schedules, Intel and where/when to make a new chip foundry or HP on how much memory to buy in a month... These are billion dollar questions.

Q - Once you get a prediction market going - how do you maintain/increase participation?

JTM - Focus on intangible benefits, reputation, experience, outcomes.

Q - Could you expand on that?

JTM - Yes, people are motivated and support what they create, value, which isn't always apparent or measurable. They operate in complex social value networks that are hard to see or understand with a conventional mindset... it is why value network analysis is rising so fast. Prediction markets and intangible value are closely linked but only a few have discovered it yet.

Q - What are the best applications initially, within a company, for a Prediction Market? Sales forecasting? Project planning? Supply chain predictions?

JTM - Yes, yes, yes. Competitive intelligence and feature function analysis are HOT too.

Q - Is 2007 the the year PMs will hit the broader public consciousness? With confab.yahoo, blog articles and so on, on the rise ...

JTM - Note that the The AEI-Brookings Joint Center is sponsoring the worldwide Prediction Markets Conference   in Washington DC on January 18, 2007. This is an important, seminal development.

December 14, 2006

Mid-Week Update: confab.yahoo, Oodle, GenieKnows.com, Trulia

Yesterday, I was over at the confab.yahoo   event on Prediction Markets . You can read my report  about it over on the Read/WriteWeb  . (Thanks, Richard MacManus, for the opportunity!)

The big news in vertical search this week was a white paper   released by Slack Barshinger   and SearchChannel, which projects that revenue from b-to-b vertical search engines will reach $1 billion   by 2009.

In other news:


  • Search This Blog


    Web This Blog