The "Dumbness of Crowds"
Kathy Sierra recently had a fascinating post in the Creating Passionate Users blog: The "Dumbness of Crowds" , in which she carefully analyzes the popular notion of The Wisdom of Crowds . Given the technology community's current Web 2.0-startup craze, with its heavy reliance on the concepts of community and crowd-sourcing,
this is a very relevant and timely discussion. In her post, Kathy makes
a distinction between two related scenarios: on the one hand,
aggregating knowledge from a collection of individuals working
independently (wisdom), and on the other, a group of people acting
together, such as the behavior of a crowd or the consensus decision of
a committee (dumbness).
[For more about harnessing collective wisdom, check out these great articles by Tim O'Reilly, Dion Hinchcliffe, Nick Carr and Chris Saad.]
The pitfalls of "Collective Wisdom"
Clearly, there are specific constraints that need to be satisfied in order to
ensure that the aggregate results of the crowd actually result in
collective wisdom. These constraints can be catalogued and analyzed
(see the "Failures of Crowd Intelligence" section in this Wikipedia article ); violating any of them will severely degrade the quality of the results.
These pitfalls also affect Web 2.0 communities. In order for the aggregated information to represent "collective wisdom", the community must avoid the following common scenarios:
Does the community provide enough diversity in viewpoints? The value of the outliers
cannot be overstated. If the size of the community is simply too small,
or the community is too homogeneous, then everyone drinks the same
kool-aid and there is not enough disagreement.
If strong players within the community have the ability to influence others' votes, then the positions taken by participants are no longer independent.
Is the voting fair - does it reduce (or prevent) malicious votes? If
participants have the incentive and the ability to subvert the results
for their own personal gain, then the collective solution is not going to be very
meaningful.
Are there network
effects that affect the outcome? Quite often, the process starts
democratically enough, but once a single solution or viewpoint starts
to get relative traction, positive feedback pushes it to overwhelming adoption. In
other words, the system exhibits unstable equilibrium.
Are users actively participating? This is somewhat different
from the earlier point about community size; as a practical matter, for the system to work, it must be in
the voter's self-interest to vote and vote fairly, leading to the best
results.
Is the voting format implicit or
explicit? An explicit system requires more work from participants and
is also more susceptible to spam or gaming.
Popular Web 2.0 Communities: Search Wisdom or Dumbness?
With these constraints in mind, let us evaluate some of the popular Web 2.0 text search/information
findability solutions based on "crowd-sourcing", to see which applies better: Collective
Wisdom or Collective Dumbness?
[Note: The solutions examined here are among the leading lights in
their respective genres; most of the discussion applies to other
similar engines in each space]
Google:
Google has an incredibly efficient algorithm for harnessing Distributed Collective Intelligence from the global community to improve findability (aka search); this is one of the most successful implementations of this concept.
Properties:
- Data Collection: Implicit
- Summary: Google's approach can be summarized simplistically as: "On
any topic, the information that most people refer to is the most
important, and is what everyone wants to find"
- Approach: Uses static links as a proxy for user votes
- Gaming: Susceptible to spamming and SEO, with no community check on
gaming (by design); thus there is a strong incentive to vote unfairly
for marketing advantage
- Targeting: Heavily targeted for undue influence, due to the strong financial motivation for participants involved
- Network Effects: Very strong; voting starts off democratically for
any new topic, but once these effects kick in, it is really hard for
new entrants to gain traction, regardless of their quality
Conclusion:
Wisdom, but converging towards big-budget marketing output; outliers are progressively less likely to see the light of day
Wikipedia:

Wikipedia is also one of the most successful implementations for
capturing collective intelligence; the approach itself is not very
efficient, since it depends on manual edits, but the collective efforts
of a large community largely overcome this limitation.
Properties:
- Data Collection: Explicit
- Simple Summary: "Anyone can edit the information, so that solutions
revert to the mean, which is accuracy (since different users make
different mistakes)"
- Approach: Relies on direct edits to represent voting and volunteer editors for direct control of content
- Gaming / Network Effects: Much less susceptible to spamming and network effects
- But the process is not democratic enough, since editors exercise significant authority; it's hard for outliers to make their way in
- Targeting: Heavily targeted for undue influence
- Strong incentive to vote correctly "for the good of all"
Conclusion:
Wisdom, but slowly losing its heavily populist, idealized underpinnings
Del.icio.us:
Another
highly efficient algorithm for capturing Distributed Collective
Intelligence; this is arguably the most successful player in the
crowded online bookmarking space.
Properties:
- Data Collection: Implicit
-
Simple Summary: "Everyone tags and stores their own links, and everyone
benefits from the aggregate knowledge that can be extracted"
- Approach: Relies on users implicitly contributing to the creation of a taxonomy and categorization of content
- Gaming / Targeting: Not very susceptible to spamming, nor heavily targeted (except through good copywriting!)
- Network Effects: Some network effects. But the tagging process is
completely democratic, and outliers can easily make their way in
- Users vote for their own self-interest, and votes are likely to be very fair, although individual accuracy may vary widely
Conclusion:
Wisdom, the true "Wisdom of Crowds"
Technorati Search:
Properties
for Technorati are very similar to those for del.icio.us, except that
users search for blog posts rather than web pages
Properties:
- Data Collection: Implicit
- Simple Summary: "Tags from blog posts bubble up, and are grouped together to form a folksonomy"
- Approach: Relies on users implicitly contributing to content categorization
- Gaming/Targeting: Not particularly susceptible to spamming or manipulation, nor heavily targeted
- Network Effects: There are some network effects, especially the "echo-chamber" effect
- In general, users vote in their own self-interest and votes are reasonably fair
Conclusion:
Wisdom
Digg:
Digg
uses an interesting approach to find articles/web pages of interest.
Its algorithm is based on aggregating the active voting patterns of
users for harnessing collective intelligence.
Properties:
- Data Collection: Explicit
- Simple Summary: "Everyone votes on whether a given article is interesting"
-
Approach: Relies on users to submit articles and mark them positively
or negatively; results are rolled up to find the most interesting
articles
- Gaming: Very susceptible to spamming/gaming (there have been many articles written about it ); collaborative voting and the reputed "gangs of diggers" undermine the independence of votes
- Targeting: Heavily targeted for undue influence
- Network Effects: Very strong network effects, based on both article and author
- A recent change to the algorithm subverted the democratic principle of "one user, one vote"
Conclusion:
Rapidly descending into madness Dumbness?
Related Reading
