April 20, 2008

Cooperation of Alternate Search Engines: A Manifesto

( This post is inspired by my discussions with my friend, Charles Knight of AltSearchEngines )

Background

I'll be at the Alternative Search Engines Day tomorrow, a unique event in San Francisco put together by Charles and the AltSearchEngines team. The event is sponsored by SeeqPod, UpTake, Matchpoint, HealthPricer, GoPubMed and Blogdimension. (Unfortunately, it's not open to the general public.) If you're part of an Alternative Search Engine, I hope to see you there!

As I was getting ready for the event, it got me thinking about ASEs and how they can work together.

The Case for the Alts

I love the ASEs - Alts rock! Without them, there would be little innovation in Search, no new frontiers to be explored.

The Alts are the ones that keep pushing the envelope with new directions in search technology, whether it's algorithms, user interface, social search or something else.  Although Google has some fine technology and is synonymous with search, I firmly believe that we're still at Search 1.0, and have a long way to go. Because of all this competition from the Alts, and the resulting innovation, web search continues to improve.

Continue reading "Cooperation of Alternate Search Engines: A Manifesto" »

April 14, 2008

Web 2.0: The Real Opportunity Lies Ahead of Us

JP Rangaswami wrote an amazing post on his blog a little while ago: Interesting, but of no commercial value , in which he cites a series of examples of new technologies - like email and spreadsheets - that were initially considered simply interesting, rather than useful; now we cannot imagine living or working without those very same technologies. It seems likely that this will happen with today's emerging technologies, like RSS feeds, popular voting, social networking, micro-blogging, crowdsourcing and so on.

History Repeats Itself

We have already seen this happen with Web 1.0. A series of tiny, well-capitalized startups (remember Webvan? ) gained early traction online in a variety of market segments, from books to furniture to pet food to groceries. The large, established brick-and-mortar players were slow to respond.

Continue reading "Web 2.0: The Real Opportunity Lies Ahead of Us" »

April 07, 2008

Enterprise 2.0: The Engineering of Marketing Online

When I was talking with my friend Shreesha Ramdas (from OuterJoin ) last week, he shared a perspective that really resonated with me. In a nutshell, he believes that the Marketing of online products and sites is rapidly becoming an Engineering function, both in terms of operational activities and measurement.

The more I think about it, the more I'm convinced that he's on to something. Marketing of online products and sites is inherently different from classical marketing. Unlike regular marketing channels, online campaigns allow marketers to proceed systematically step-by-step along a predetermined course. The results of each distinct campaign can be measured precisely, even when multiple campaigns are going on simultaneously. Most important, the market can be broken up into thousands of micro-segments, with targeted campaigns aimed at each one.

Continue reading "Enterprise 2.0: The Engineering of Marketing Online" »

March 15, 2008

Two Emerging Memes: Freenomics and Crowdsourcing

Silicon Valley has always been fascinated by new technology memes, radical new ideas and business models that take off and capture the popular imagination - often changing the world of technology in the process. Valley memes quickly become the jargon of leading-edge VCs. For example, the phrase "it's a Freemium model, based on UGC and Wisdom of Crowds" requires no explanation in certain circles; it's as meaningful to these players as the phrase "it's a RESTful Web Service based on a 3-tier J2EE architecture" is to a Web Applications Architect.

(For the uninitiated: Freemium is a term popularized by VC Fred Wilson to mean a free service with an upsell to paid premium subscriptions; UGC stands for User-generated content, a relatively new concept in which the audience or readership helps to craft creative or original content; and Wisdom of Crowds is a term popularized by James Surowiecki's book of the same name.)

How does a technology meme start? When does an interesting new idea or concept tip over into a meme?

Continue reading "Two Emerging Memes: Freenomics and Crowdsourcing" »

March 01, 2008

Rumors of the Death of Indian Outsourcing - Are Greatly Exaggerated

I've long been a fan of Sramana Mitra - she has a terrific blog and has her own deep definition and framework for Web 3.0 . Forbes.com carries a fascinating article by her this morning: The Coming Death Of Indian Outsourcing , in which she talks about the new challenges facing Indian Outsourcing Companies.

As usual, Mitra's basic analysis is spot-on, although one can certainly take issue with her conclusions. There is no question that Indian OCs (Outsourcing Companies)  now face unprecedented challenges, and cannot carry on with a "Business As Usual" approach for much longer.

This should not come as a surprise; competing on the basis of price alone is never a sustainable business strategy. In a price war between brands (or countries!), no supplier wins. And this price advantage is rapidly eroding, with the steep rise in software development labor costs and the deteriorating strength of the dollar.

Continue reading "Rumors of the Death of Indian Outsourcing - Are Greatly Exaggerated" »

February 28, 2008

Semantic Web - What is the Core Problem?

In his latest blog post, Mathew Ingram writes about Paul Miller's interview with Sir Tim Berners-Lee, inventor of the World Wide Web. Miller's interview writeup is very interesting - as Marshall Kirkpatrick notes on the ReadWriteWeb, Sir Tim feels that all the pieces for the Semantic Web are already in place to realize a large part of the dream and to allow us to create applications that leverage the power of structured data and the integration of that data.

[One big problem for the Semantic Web that I've written about recently is the lack of meaning-enabled authoring tools; however, in the interview, Sir Tim indicates that this need is less critical; the structured data we need can come from databases.]

Coming back to Ingram's post, he says that the biggest problem with the Semantic Web is that "it’s as boring as dry toast" - i.e. it's all about the technical side, with discussions about plumbing and widgets and standards, and there's nothing there that will make people sit up and take notice.

Continue reading "Semantic Web - What is the Core Problem?" »

February 24, 2008

Semantic Web: Where are the Meaning-Enabled Authoring Tools?

Jason Kolb sees it as a way to identify data objects using URIs. John Markoff, of the New York Times, calls it Web 3.0 . And Nova Spivack has a long post clarifying what it is Not.

What are all these authors talking about? The Semantic Web - much has been written recently about its concepts, approaches and applications. But there's something missing, a piece that hasn't generated much interest to date.

In terms of understanding, finding and displaying content, there is no doubt that the Semantic Web is slowly becoming real (e.g. there were some great demos at a recent SDForum meet ). However, a gap is emerging with Content Authoring tools, which have not yet made this paradigm shift.

On the one hand, most authors are comfortable with, and proficient in, desktop authoring tools, such as Microsoft Word, FrontPage, Adobe GoLive and others; this is especially true for professionals and other experts who create technical reference content for web applications, such as legal references, accounting manuals or engineering documents. The current crop of authoring tools produce visually high-quality articles and web pages, but their XML or RDF creation capabilities are severely limited.

On the other hand, parsing Word documents or HTML web pages to extract meaningful structure out of them, gives poor results; much of the semantic knowledge of the content is lost. There do not appear to be any popular tools that create Semantic content natively and yet are natural and easy for a content author to use.

Top-Down? Or Bottom-Up?

Of course, there are ways to get around this issue to some extent. Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way - a la Spock, twine and Powerset (see writeup ) - but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors' hands in the first place, extracting the semantic meaning would be so much easier.

For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records - say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.

Ideally, the authors would create the content as meaningful XML text or RDF triples, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create information in this way, and yet are natural and easy for authors to use, don't appear to be on their way; and the creation of a custom tool for each individual domain, seems a difficult and expensive proposition.

Car Review Example

As a more concrete example: imagine that you control a web site called New-Car-Reviews.com, a hypothetical site that reviews new cars; you pay expert authors to write reviews of new car models every year for this site. Unlike other automobile characteristics, reviews cannot be easily stored into a database and queried. Conceptually, your reviews are similar to this review for the 2008 Volvo S40 2.4i sedan on the automotive site Kelley Blue Book.

In the current paradigm, a typical element of the review is usually written something like this:

    <span id="ctl00">You'll Like This Car If...</span>
        ...description_positive...
    <span id="ctl00">You May Not Like This Car If...</span>
       ...desc
ription_negative...

For the future, imagine this: when your authors are originally composing this review, what if they could instead create it with semantic markup embedded:
(In this example, I use straight XML for simplicity; the actual format of the content could be RDF-triples, or some other improved format)

    <advantages>You'll Like This Car If...
        <text>...desc
ription_positive...<text>
    </advantages>
    <disadvantages>
You May Not Like This Car If...
        <text>...description_negative...<text>
    </disadvantages>

then you can get more value out out of the same content:

  (a) You can easily *re-purpose* the content in additional ways, such as for mobile devices, RSS feeds, web services APIs, mashups and so on
  (b) As search engines start to take advantage of semantic notation, you get SEO benefits
(c) You can provide users with ways to query the content *intelligently* ("show me cars which are family-friendly AND don't roll over easily vs those that work better off-road AND seat 7"), using tools such as the recently-released SPARQL .

As a content publisher, you want your content to be found and used as much as possible, and making it meaning-enabled is a big step in this direction. At the same time, you cannot ask authors to use a pure XML tool such as XMLSpy or an ontology editor like protégé; and MS Word creates unreadable XML that specifies formatting rather than semantics.

A solution for this specific example already exists: Microformats could be applied to handle the problem of annotating the advantages and disadvantages. While the Microformat solution works very well for specific types of information - such as for describing people and addresses - it is too limited to be applicable in a general way to add semantic information to web content at large.

It seems to me that the general problem must be solved if we are to see large-scale adoption of the Semantic Web. It would be a boon to expert authors everywhere, including those who create news articles for the newspaper publishing industry. But there do not seem to be any solutions on the horizon, in terms of technologies, tools or processes to promote the creation of more meaning-rich content.

Reactions: But is there a Business Case?

When I put this question to a group of prominent bloggers and industry thought-leaders in the Semantic Web space, the results were not encouraging. There does not seem to be much interest in building Semantic authoring tools. The main stumbling block is the lack of a clear business model for publishers to embrace this approach.

Jeremy Liew of Lightspeed Venture Partners, has recently penned a series of articles focused on Semantic Web: Meaning = Data + Structure , based on user-generated structure, domain knowledge and user behavior , which focus on the problem of inferring meaning from content.

He questions the business rationale for authors to take the effort to add XML markup to their content, and points to domain-specific extraction approaches as the more likely solution:

The challenge with getting most authors to markup in XML is not just one of tools, but also of motivation IMO. Unless and until a clear business case advantage justifies the additional effort required, and that advantage is greater than other projects offer, you won't see much semantic markup except from academics and others whose interests are more philosophically driven than business driven.

That is why I think the domain specific extraction approaches will likely be more prevalent - the business advantage of better search and structure accrues to the person doing the extraction, and because it is domain specific, the additional effort is lessened

He's right, of course; domain-specific extraction approaches are definitely going to be popular, and are beginning to take off already. It provides significant added value for the extractor. However, it's difficult and expensive to do it well, so the business case is somewhat dubious for the early adopters.

ReadWriteWeb's Alex Iskold is another thought leader in this space. He has a series of fantastic articles about the Semantic Web, including the problem of annotating data, the different approaches used, and a primer for the structured web.

His comments echoed those of Liew:

There seems to be little incentive for publishers to annotate information.

The problem is that if you go deep enough you hit RDF. The light version is Microformats. But the issue is not the format, its the incentive.


Tim O'Reilly wrote about this issue almost a year ago: Different Approaches to the Semantic Web , in which he echoes the same sentiment:

It seems easy enough, but why hasn't this approach taken off? Because there's no immediate benefit to the user. He or she has to be committed to the goal of building hidden structure into the data. It's an extra task, undertaken for the benefit of others. And as I've written before, one of the secrets of success in Web 2.0 is to harness self-interest, not volunteerism, in a natural "architecture of participation."

Conclusion

I guess I'm a minority of one. It seems to me that if content creators could add semantic meaning while constructing the content in the first place (which is, conceptually, only marginally more difficult for the authors), then the value of the content would increase exponentially at very low cost. That seems like a defensible business case for content publishers.

The business case for publishers to annotate existing web pages and content is certainly very weak. But for new content, if you're creating it for your site anyway, why wouldn't you add semantic markup to make it more findable and usable?

What do you think? Please leave a comment below or email the author (removing the ".aa" at the end) and let us know!



February 12, 2008

Enterprise 2.0: Top 5 Corporate Challenges for 2008 and beyond

The Few, the Proud

A few days ago, in its commentary section, the Wall Street Journal reported on an interview with General James T. Conway, Commandant of the U.S. Marine Corps.

In the interview, Gen. Conway muses on the way the tactics and equipment of the Marines are changing, in response to the unique nature of the responsibilities they have in Iraq and the evolving nature of their mission.

One way the Marines are clearly changing is in the vehicles troops use to patrol in Iraq. "If you look at the table of equipment that a Marine battalion is operating with right now in Iraq," Gen. Conway explains, "it is dramatically different than the table of equipment the battalion used when it went over the berm in Kuwait in '03, and it is remarkably heavier. Heavier, particularly in terms of vehicles.  ....these type of things, make us look more like a land army than it does a fast, hard-hitting expeditionary force."

...
In short, wars have a tendency to change the culture of the militaries that fight them. For the Marines, the cultural change they fear most is losing their connection to the sea while fighting in the desert.

In the midst of all this change, Gen. Conway is worried about preserving the essential character of the Marine Corps, even as the rest of the world changes around it. As an organization, the Corps faces one of the most daunting management challenges in the world: keeping individual Marines highly motivated and getting them to excel at a difficult, dirty and dangerous job, in the face of low pay and extreme working conditions. It is critical to preserve this esprit de corps, even while gearing up for new missions for the modern battlefield.

Corporate Trends

What does this have to do with modern corporate organizations? While the specific conditions are very different - no bullets or humvees are involved - companies have also been facing a set of discontinuous shocks in the last few years, and their pace is only increasing. In many ways, corporate leaders are facing major changes with challenges similar to the ones facing Gen. Conway, to which they must respond quickly and effectively, without losing their own organizational culture and common knowledge.

What do these discontinuities represent? Given below are five significant challenges facing corporate organizations in 2008 and beyond. None of these are new but their trends are rapidly accelerating. For a company to survive and compete effectively, it is imperative that its leaders have a strategy to handle each one.

1. Outsourcing Partnerships:
Corporate outsourcing has been growing rapidly over the last ten years. Initially it started simply as a way to find talented technical workers quickly and at low cost.

Recently, though, this trend has been evolving; outsourcing vendors are now seen as strategic partners who participate in the corporate vision of the Enterprise and share in its successes and failures. Bringing these outsourcing partners into the fold (especially if they are off-shore) is neither quick nor easy.

2. Hyper-Informed Consumers:
The average person in the United States and other developed countries already has unprecedented access to information, more so than at any other time in history; this access is now spreading across the rest of the world following the proliferation of mobile phones.

Coupled with a corresponding increase in the willingness and enthusiasm of users to consume that information - e.g. watching the quarterly interest rate changes by the Federal Reserve is now a national pastime in the U.S. - this means that consumers are now exceptionally well-informed.

Companies must act accordingly; they must either join the online conversation as equal partners (as Cluetrain suggests ), or be left out.

3. True Globalization:
Increasingly, corporations find their talent, their suppliers, and most importantly, their customers, at an international level and compete for them with other multi-national conglomerates, both domestically and abroad. As Thomas L. Friedman's popular book says, the World has become Flat again.

This change brings a whole new set of challenges with it; only organizations with a truly international mindset can survive and thrive.

4. Communication and Collaboration across Distributed Teams: 
With this new diversity of cultures, geographies and perspectives within the Enterprise, it is even more critical to get teams to communicate openly and to embrace a shared vision.

The recent uptrend in the use of Enterprise 2.0 strategies and tools is a positive development in addressing this need. These tools help to bridge the gaps; they promote collaborative design and development, and enable rapid dissemination of information among international teams spread across geographical boundaries and many different time zones.

The speed and flexibility with which these distributed organizations can respond to changes in market conditions, is now a major competitive differentiator.

5. The Dominance of Search:
As commerce shifts increasingly online, the use of the Internet for research, analysis and selection of vendors, and for making purchases, is increasing rapidly. In the future, it is expected to become the primary way users find information.

This means that if a company does not show up near the top of search results for the major search engines, then essentially it doesn't exist for new prospects and even for previous clients. Addressing this challenge requires a significant change in Marketing philosophy. Companies can no longer coast on the strength of their brands; they must continually invest time and energy in refining their Search Engine Optimization (SEO) strategies.

Conclusion

These five trends go together and strengthen one another; so do the challenges associated with them. This positive feedback means that the changes are only going to accelerate in the future.

How is your company addressing these challenges? Add a comment below and  let us know!

(Note: This post previously appeared on Profy.com )



January 29, 2008

Zvents makes Local Search pop!

There is a class of web search engines that can prove even more useful than Google within a certain context. I'm talking, of course, about Vertical Search engines - the writer and tech strategist Sramana Mitra considers them Google's Achilles heel and Profy.com's Cyndy Aleo-Carreira seems to agree. This blog also has long held the position that vertical search represents a powerful mechanism to find information on the web, and is a key category to watch in the search wars of the future. [see: The rise of Vertical Search Engines from Aug 2006].

Another way of achieving a similar focus, in order to improve the relevance of search results, is by segmenting by location rather than by industry vertical - i.e. create a hyperlocal search engine that limits its search results to a given geographical area.

One such alternate search engine is Zvents, which is relentlessly focused on local information, of any sort. This company, which has been around since early 2005, has just introduced an advanced feature called Federated Local search - basically, its own version of Universal Search (recall that Google introduced its Universal Search feature with much fanfare last May).

Federated Local Search: Multi-Dimensional Results for Local Information

What does Universal Search mean, for a local search engine? Initially this was not very clear to me; an email discussion with Paul O'Brien, Director of Marketing at Zvents, inspired me to draw the following diagram:



The basic idea is to enable the user to implement a general-purpose search within a local context. This allows the user to find local information about a given topic, across many different dimensions. For example, a sports fan living in San Jose, CA who tries a local search for the term "hockey", would get the following different types of results:

  • Upcoming games for the San Jose Sharks, the local hockey team
  • The location of Roosevelt Park Roller Hockey Rink
  • The description and link for a local "Hockey Night" event
  • Results about relevant personalities (what Zvents calls "performers")
  • And other related links ...

Zvents has already partially implemented this vision, although some of the lower-ranked results could provide a better match. Hopefully these will improve in the future as the search index grows and the algorithm improves. A screen shot of this local Hockey search in Zvents is given below.



Similarly, here's a search for the term "Web 2.0" for Cupertino, CA:



Outcome: Relevance

The big advantage of this type of search, over a general-purpose Google or Yahoo! search, is that the user can obtain the benefits of a broad cross-section of results, while still constraining the search to a limited geographical area.

This is not a significant issue in highly developed, urban, technologically advanced areas like Silicon Valley, Boston or New York; but it could one day make a big difference for someone living in David Letterman's "home office" of Wahoo, NE , or even more important, someone trying to find the Boston Public School located in Boston, Ontario - as we've seen before, highly popular keywords tend to swamp nearby long-tail keywords in the search results for major search engines.

From a business model perspective, hyperlocal searches tend to provide highly qualified prospects for local merchants, so I would guess that this type of search is very easily monetizable in the long run.

From a user interface point-of-view, the NLP-like implementation of time period for the search engine ("when: tonight, this weekend, ...") is a nice touch; I tried different possibilities ("next month"), and it seemed to work just fine.

On a more technical note, Zvents has been making waves with the release of its open-source Bigtable clone called Hypertable, which adds a C++ option for this project.

Going forward, it will be interesting to see how Zvents scales to additional locations, and to additional dimensions within each locality. Will it make inroads into the market share for any of the major search engines, or into that of other locally-focused web sites like topix.com and craigslist?



January 22, 2008

Disambiguation of Search Results? Yup, Google's got that

Just last week, in an email exchange with another search blogger, I wondered when Google would provide options for disambiguation of search results.

When you think about it, that's an obvious requirement for the Results page of any serious search engine. If I query for the search term "Java" - does it mean that I'm looking for results about the programming language, the coffee, or the island in Indonesia?

There's no way for the search engine to be able to tell, although personalization could provide clues. The easiest solution, as I wrote back in 2006, is for the search engine to just ask - which is why Wikipedia offers this page: Java (disambiguation) . Alternatively, the results can be grouped into various categories for the user to choose from, which is another way of doing the same thing.

Until now, Google has been mostly following a third option, which is to simply pick the most popular category regardless of the user's real preference; this can lead to some strange results, as highlighted in my earlier post on deconstructing real Google searches. But this approach doesn't really cut it, since it ignores all the unpopular search results - it's very possible that the long-tail searches can collectively make up a market share that rivals or exceeds the relatively few "popular" searches.

There has also been a limited amount of disambiguation offered by Google's "related searches" feature.

Well, no more. Google appears to be experimenting with offering disambiguation directly by grouping search results into categories. See the screen shot below, that shows Google search results for the query: "freebase" . Effectively, the results page seems to be asking: do you mean, the free semantic web database, or the other kind, associated with drugs? Or a third alternative: FreeBase - a free Windows software program to configure the Apple AirPort Base Station.

The use of horizontal ruled lines to separate the sections, is a nice touch!



Obviously this is some type of test; I certainly hope it's successful. I can't wait to see this feature become mainstream among the major search engines. It will be a big step forward in Search!



  • Search This Blog


    Web This Blog