March 20, 2008

Tim O'Reilly and Sir Tim Berners-Lee concur: Semantic Web Likely to be Top-Down

In a previous post, I asked the question: Where are the Meaning-Enabled Authoring Tools?, arguing that publishers who regularly post similar content (especially content that conforms to common formats) would get a big advantage from using Semantic Authoring tools for creating new content. By using semantic tools, not only can you get SEO benefits and improve findability , the content can more easily be re-purposed for other uses such as web applications and services.

This is essentially a bottom-up approach to the semantic web: adding semantic notation to the content itself. However, as the post went on to say, the prevailing view is definitely a top-down one, viz. that semantic meaning will have to be extracted by applications from perfectly ordinary web pages, and that the adding of semantic knowledge to the content itself is unlikely (aside from very limited contexts, such as Microformats).

Two recent podcasts with two of the leading voices in this space further confirm this view.

Continue reading "Tim O'Reilly and Sir Tim Berners-Lee concur: Semantic Web Likely to be Top-Down" »

February 24, 2008

Semantic Web: Where are the Meaning-Enabled Authoring Tools?

Jason Kolb sees it as a way to identify data objects using URIs. John Markoff, of the New York Times, calls it Web 3.0 . And Nova Spivack has a long post clarifying what it is Not.

What are all these authors talking about? The Semantic Web - much has been written recently about its concepts, approaches and applications. But there's something missing, a piece that hasn't generated much interest to date.

In terms of understanding, finding and displaying content, there is no doubt that the Semantic Web is slowly becoming real (e.g. there were some great demos at a recent SDForum meet ). However, a gap is emerging with Content Authoring tools, which have not yet made this paradigm shift.

On the one hand, most authors are comfortable with, and proficient in, desktop authoring tools, such as Microsoft Word, FrontPage, Adobe GoLive and others; this is especially true for professionals and other experts who create technical reference content for web applications, such as legal references, accounting manuals or engineering documents. The current crop of authoring tools produce visually high-quality articles and web pages, but their XML or RDF creation capabilities are severely limited.

On the other hand, parsing Word documents or HTML web pages to extract meaningful structure out of them, gives poor results; much of the semantic knowledge of the content is lost. There do not appear to be any popular tools that create Semantic content natively and yet are natural and easy for a content author to use.

Top-Down? Or Bottom-Up?

Of course, there are ways to get around this issue to some extent. Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way - a la Spock, twine and Powerset (see writeup ) - but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors' hands in the first place, extracting the semantic meaning would be so much easier.

For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records - say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.

Ideally, the authors would create the content as meaningful XML text or RDF triples, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create information in this way, and yet are natural and easy for authors to use, don't appear to be on their way; and the creation of a custom tool for each individual domain, seems a difficult and expensive proposition.

Car Review Example

As a more concrete example: imagine that you control a web site called New-Car-Reviews.com, a hypothetical site that reviews new cars; you pay expert authors to write reviews of new car models every year for this site. Unlike other automobile characteristics, reviews cannot be easily stored into a database and queried. Conceptually, your reviews are similar to this review for the 2008 Volvo S40 2.4i sedan on the automotive site Kelley Blue Book.

In the current paradigm, a typical element of the review is usually written something like this:

    <span id="ctl00">You'll Like This Car If...</span>
        ...description_positive...
    <span id="ctl00">You May Not Like This Car If...</span>
       ...desc
ription_negative...

For the future, imagine this: when your authors are originally composing this review, what if they could instead create it with semantic markup embedded:
(In this example, I use straight XML for simplicity; the actual format of the content could be RDF-triples, or some other improved format)

    <advantages>You'll Like This Car If...
        <text>...desc
ription_positive...<text>
    </advantages>
    <disadvantages>
You May Not Like This Car If...
        <text>...description_negative...<text>
    </disadvantages>

then you can get more value out out of the same content:

  (a) You can easily *re-purpose* the content in additional ways, such as for mobile devices, RSS feeds, web services APIs, mashups and so on
  (b) As search engines start to take advantage of semantic notation, you get SEO benefits
(c) You can provide users with ways to query the content *intelligently* ("show me cars which are family-friendly AND don't roll over easily vs those that work better off-road AND seat 7"), using tools such as the recently-released SPARQL .

As a content publisher, you want your content to be found and used as much as possible, and making it meaning-enabled is a big step in this direction. At the same time, you cannot ask authors to use a pure XML tool such as XMLSpy or an ontology editor like protégé; and MS Word creates unreadable XML that specifies formatting rather than semantics.

A solution for this specific example already exists: Microformats could be applied to handle the problem of annotating the advantages and disadvantages. While the Microformat solution works very well for specific types of information - such as for describing people and addresses - it is too limited to be applicable in a general way to add semantic information to web content at large.

It seems to me that the general problem must be solved if we are to see large-scale adoption of the Semantic Web. It would be a boon to expert authors everywhere, including those who create news articles for the newspaper publishing industry. But there do not seem to be any solutions on the horizon, in terms of technologies, tools or processes to promote the creation of more meaning-rich content.

Reactions: But is there a Business Case?

When I put this question to a group of prominent bloggers and industry thought-leaders in the Semantic Web space, the results were not encouraging. There does not seem to be much interest in building Semantic authoring tools. The main stumbling block is the lack of a clear business model for publishers to embrace this approach.

Jeremy Liew of Lightspeed Venture Partners, has recently penned a series of articles focused on Semantic Web: Meaning = Data + Structure , based on user-generated structure, domain knowledge and user behavior , which focus on the problem of inferring meaning from content.

He questions the business rationale for authors to take the effort to add XML markup to their content, and points to domain-specific extraction approaches as the more likely solution:

The challenge with getting most authors to markup in XML is not just one of tools, but also of motivation IMO. Unless and until a clear business case advantage justifies the additional effort required, and that advantage is greater than other projects offer, you won't see much semantic markup except from academics and others whose interests are more philosophically driven than business driven.

That is why I think the domain specific extraction approaches will likely be more prevalent - the business advantage of better search and structure accrues to the person doing the extraction, and because it is domain specific, the additional effort is lessened

He's right, of course; domain-specific extraction approaches are definitely going to be popular, and are beginning to take off already. It provides significant added value for the extractor. However, it's difficult and expensive to do it well, so the business case is somewhat dubious for the early adopters.

ReadWriteWeb's Alex Iskold is another thought leader in this space. He has a series of fantastic articles about the Semantic Web, including the problem of annotating data, the different approaches used, and a primer for the structured web.

His comments echoed those of Liew:

There seems to be little incentive for publishers to annotate information.

The problem is that if you go deep enough you hit RDF. The light version is Microformats. But the issue is not the format, its the incentive.


Tim O'Reilly wrote about this issue almost a year ago: Different Approaches to the Semantic Web , in which he echoes the same sentiment:

It seems easy enough, but why hasn't this approach taken off? Because there's no immediate benefit to the user. He or she has to be committed to the goal of building hidden structure into the data. It's an extra task, undertaken for the benefit of others. And as I've written before, one of the secrets of success in Web 2.0 is to harness self-interest, not volunteerism, in a natural "architecture of participation."

Conclusion

I guess I'm a minority of one. It seems to me that if content creators could add semantic meaning while constructing the content in the first place (which is, conceptually, only marginally more difficult for the authors), then the value of the content would increase exponentially at very low cost. That seems like a defensible business case for content publishers.

The business case for publishers to annotate existing web pages and content is certainly very weak. But for new content, if you're creating it for your site anyway, why wouldn't you add semantic markup to make it more findable and usable?

What do you think? Please leave a comment below or email the author (removing the ".aa" at the end) and let us know!



December 28, 2007

How long before the walls around content come crashing down?

Scott Karp of Publishing 2.0 has posted an interesting article today: What Is The ROI Of Requiring User Registration To Access Online Content? , in which he takes a close look at the registration wall used by the New York Times online and wonders whether it is worthwhile.

The theory goes that personal data collected from registered users enables sites to better target ads and charge premium rates. But I wonder whether the lost traffic from users who choose not to jump through the registration hoop — which I bet is particularly true of NYTimes’ large volume of visitors from search engines — outweighs the gain of higher ads rates (assuming NYTimes.com is consistently able to charge higher rates).

As Karp notes, the registration requirement presents a barrier to access for users who come in through a search engine, at a time when NYTimes.com is  focused on growing their readership beyond the current regular readers; and these casual users are just the type of users who are likely to have a lower tolerance for jumping through registration hoops, notwithstanding the NYTimes.com claim that registration takes "only a minute".

In one of the comments to the article, Howard Owens responds by questioning a critical assumption; Owens asserts that the registration requirement does not, in fact, cause traffic to drop.

I’ve run two registration sites, and have spoken with other newspaper.com site managers who have run their own registration-required sites, and two things I found to be true based on empirical evidence:

1) There is no drop off in traffic past the first 60 days of registration (after 60 days, traffic exceeds pre-registration numbers and continues to grow).
...


Personally I believe that the ROI of requiring user registration is questionable at best. Intuitively, it makes sense that at least some users will get discouraged and drop off when confronted with a "registration required" notice; so there's bound to be some negative impact, with all due respect to Howard Owens [perhaps the numbers he saw can be explained by other changes that happened at the same time, such as SEO enhancements that bumped up traffic, compensating for the impact of the registration?].

At the same time, there is another major trend currently under way that will increase the importance of this debate, and in my opinion, accelerate the crumbling of these registration and payment walls.

This big change is in user behavior. Individual consumers are increasingly flocking first to the major search engines when looking for information and data, rather than to individual web sites, even when they already know high-quality sites that can provide the information. It makes sense from the user's point of view: the user wants to find high-quality content in general, regardless of source, and using a favorite search engine is a quick, easy and comfortable way to do that. This overall trend is inevitable and irreversible. As Don Dodge noted in a recent article: Search engines are the Start page for the Internet.

Empirical evidence indicates that, even for major web sites with strong brands, the number of users coming in from search engines is increasing as a percentage of total traffic (although I do not have hard numbers to back up this claim). This forces content publishers to open up more information in order to satisfy those users, which further solidifies the position of search engines as the starting point - which in turn, forces publishers to open up yet more information - and so on, in a self-reinforcing virtuous cycle.

As publishers see this changing user mix - a higher percentage of traffic consisting of new users coming from search engines - engaging those users will increase in importance, and putting barriers in their path will be less acceptable. Instead, publishers will be forced to find new and innovative models for monetization; similarly, user tracking methods will need to be improved to collect data implicitly rather than requiring explicit action from the user.

As the user starts interacting with the site - if she wants to comment, post or otherwise participate, for example - then progressive upsells into registration and payment are perfectly valid and acceptable.  By that point, the site is dealing with thoroughly-engaged users, not casual visitors.

I see it as a question of time before 99% of content from major publishers (NYTimes.com included) becomes free and openly accessible on the Web.

To paraphrase Cory Doctorow (he of the free books !) and Tim O'Reilly, on content: the real danger isn't loss of revenue through sharing, it's obscurity and irrelevance.



October 12, 2007

A conversation with Avinash Kaushik, Web Analytics Guru

Avinash Kaushik is a leading expert in the new field of Web Analytics. His blog, Occam's Razor, is one of the most popular blogs on this subject. He has lots of other exciting things happening: he's the Analytics Evangelist for Google, author of the book Web Analytics: An Hour A Day published by Wiley, and most recently, is a co-founder of a startup, Market Motive, focused on spreading knowledge for Internet Marketing. He was kind enough to agree to an email interview, given below.

If you are interested or involved in Web Analytics, I guarantee that his answers will give you much to think about.


Q
- How did you get into Web Analytics? What is it about this field that attracted your interest?

AK - I ended up in Web Analytics by pure chance.

My former roles were in decision support systems, both on the business and technical side of the fence. The Intuit job, my foray into web analytics, was attractive more because of the people and the company.

But I had always been fascinated by the web and the job allowed me to put my experience in decision support with the fantastic piece of art that the web is.

At some level it was lucky to get into web analytics with no baggage or hang ups or having read any books, it allowed me to bring a fresh and completely different perspective to it.

Q - In your study of web site user behavior, what are some of the most surprising results you've found?

AK - I am surprised that even in 2007 given how pervasive the web is and how it is used that we continue to obsess on conversion rates, essentially solving for a minority of site traffic as if people came to our sites for just one purpose. That is so 1997.

I am frequently humbled by the lessons customers have taught me when we listen to them using surveys or multivariate tests or site visits. Cool and sexy is not always enough. Simplicity is the key. Solving for customers and bottom-line is possible. Having clear calls to action on all pages (especially on those where there is no "add to cart button") and the importance of solving for your customer personas (just look at www.newegg.com, no one will call it the prettiest site in the world and yet it consistently outranks www.apple.com and www.amazon.com when it comes to customer satisfaction!) cannot be emphasized enough.

Q - With the benefit of your deep background in this field, what do you think of the Google Analytics product: What are its strengths? Which types of companies is it most useful for? Which areas do you think need to be improved?

AK - I have just published a comprehensive review of all web analytics vendors - link: http://www.kaushik.net/avinash/2007/08/web-analytics-vendor-tools-comparison-and-one-challenge.html . Your readers might find that video helpful in understanding the industry, its challenges and what unique strength each web analytics vendor brings to the table.

In the video I mention two key strengths of Google Analytics:

1) Data Democracy: Google Analytics is a drop dead easy tool to use and presents a lot of complex web analytics data in a very easy to understand manner. Because of this it flips the traditional web analytics model were a few people in the company had access to the data and shared it with others. With GA you can give everyone access to the tool and they can help themselves.

2) Best of breed search analytics: The reports and segmentation options you'll find in Google Analytics to analyze your site's search data is really good. Perhaps it should not be surprising that a web analytics tool from a search engine is good at that. You don't have to tag your campaigns because of auto tagging which saves hassle and improves data quality. Your data is also imported and integrated and presented with some unique reports.

In terms of who GA is right for...... Google Analytics is right for any company that will benefit from the above two features. The nice thing is that unlike the past were you can rule tools in and out on paper, now you don't have to take a random person's, or a "guru's" opinion, on benefits of the tool. GA is free. Throw it on your site and try it for yourself and using your own data from your own site you can determine if it is right for you.

In terms of what needs improvement.... Currently GA provides 27 pre-built segments that you can apply to any of the 80 odd reports to get 27 times 80 sets of segmented data. But I am selfish. I would love to have even more flexibility when it comes to creating visitor segments that are most relevant to each business.

Q - Your blog, Occam's Razor is one of the most successful blogs in this field. What has blogging meant to you? Are there things you would do differently with the blog if you had to start over?

AK - My wife's opinion is that the blog is our third child. :)

When I started writing the blog a little over a year ago my hope was to have around a thousand visitors a month because that is how many people I thought were my core target audience. Yesterday the number of RSS feed subscribers were at 4,600 and there were 30,000 visitors last month. That in many ways simply astounds me.

These numbers also mean that I feel a deep sense of obligation to the people who read the blog. There is always a pressure to deliver the highest possible quality in the posts that my humble skills will allow.

The blog means the world to me because of the conversation that I can have with people from around the world (around 30% of the site traffic is international). All these wonderful people write comments and their own perspectives which I learn from and all these comments add to the conversation (on my blog visitors have written approximately as much content as I have written, word for word).

In terms of different..... I wrote a post at the end of the first month I think, I would not have written that in hindsight. But other than that I would not do anything different, the blog has managed to stay hyper focused on what my initial vision was and I think it works well.

Q - Even now, Web Analytics is seen as an afterthought in some web companies, rather than being an integral part of the business process. How do you convince these companies of its importance?

AK - I agree with you, it still exits in silos (both from organization and data perspectives).

At some level it really requires the business realizing the importance from the inside that matters. No amount of outsiders coming and pontificating can drive fundamental change.

If you are inside the company then you have an inside track to helping your company realize the value of web analytics. My advice would be to focus on two simple things in a very hard core way: 1) value the web can deliver to the bottom line and 2) value the web can deliver to your customers. The interesting thing is that the web can do both of those in an efficient and scalable manner, unlike any other channel.

And if you want to help your company do #1 and #2 you need web analytics. Start showing it in small ways (rather than trying to create a overnight revolution, those rarely succeed) and I assure you that your company will "get it". Few people can argue with profit and fewer still can argue making customers happy.

Q - You've just launched a new company, Market Motive. Can you tell us more about it? Who are your target customers? Will you be offering any free content, or is it all behind the "paid" curtain?

AK - Market Motive's mission is to focus on helping Online Professionals be massively successful through access to the latest best practices and insights from the best people in each discipline. We hope to deliver that by providing fresh and unique content that will only be available at www.marketmotive.com.

The initial areas of attention will be: SEO, PPC/SEM, Web Analytics, Conversion, Email Marketing, Online PR, and Marketing Processes. We will provide videos, podcasts that provide a unique way to learn, these will be complemented with live phone-in sessions were subscribers will be able to ask their questions and get them answered by the dream team.

The target audience is Professionals whose job it is to deliver for their companies in any / all of the above mentioned seven areas.

The content created at Market Motive will be available on an unlimited consumption basis to only the subscribers. All the faculty have blogs on which they are very active.

Q - What advice would you give to a small company that's just starting to get deeper into Web Analytics (beyond basic Page Views and Referrer URLs)?

AK - Use your web analytics tool to answer questions and not simply measure "KPI's".

Here are the three questions to answer:

1) Where are people coming from? (Referring URL's, Search Engines, Key Words, etc) This helps you infer intent and identify valuable sources of qualified traffic (by simply measuring bounce rate).

2) What do they do when they are on my site? (Content Consumption, Top Entry Pages, Top Visited Pages, Site Overlay etc) This helps you understand what people might be looking for and is it easy to find and is it what you want them to see.

3) What were the outcomes, both for you and the visitors?  (Revenue, Conversion Rate, Task Completion Rates, # of leads, Likelihood to Recommend, Customer Satisfaction etc) Your site should make a difference to their existence. Is it?

Q - What is the biggest mistake in the use of WA? What should people watch out for?

AK - Usually the weakest link is that website owners rarely sit down and define why their site exists and if that's the case then any metric will look like success. You should be able to answer in fifteen words of less "why does my site exist" and then be able to identify two metrics that help you measure if your website is delivering.

The other big mistake is that Marketers and Website Owners think that they represent their customers. This is mostly false. We, company employees, are too close to our companies to ever be able to think like our customers. If you want to know what your customers think of your website experience, ask them.

Q - What radical changes do you think we will see in Web Analytics in the next 3-5 years? Do you expect to see a big impact from the proliferation of Social Networks (like Facebook)? What about SEO and the increasing importance of search engine traffic?

AK - The web reinvents itself and that is what makes it fun. I think with all the web 2.0 buzz we are in the middle of one such transformative experience. Each such transformation like that requires the measurement methodologies to evolve as well. We are now trying to figure out how to measure ajax and flash and videos and podcasts and so on.

In the next couple of years I think web analytics will change radically. In the near term we will evolve to measure the aforementioned fluid experiences much more effectively. In the slightly longer term I am anticipating (and hoping) that web analytics will transform into business analytics. A way of life, a normal way of existence, just like other pieces of analytics that tend to have nothing special about them, and not an afterthought.

I have recently written about Web Analytics 2.0 (http://www.kaushik.net/avinash/2007/09/rethink-web-analytics-introducing-web-analytics-20.html) and how we already need to think differently to be more optimally competitive.

As regards to social networks and SEO etc I think that these types of wonderful things will never leave us (hopefully not). From a web analytics perspective we need to come up with more efficient ways to collect data, not matter which way life on the web evolves. I am optimistic that in the next few years we'll have that figured out.

 

Previous related articles:

      Top 8 Reasons to Implement Tracking and Measurement for your Web Site

      Web Design and the Scientific Method

       A conversation with Guy Kawasaki



September 25, 2007

Can the Semantic Web bring us Trusted Search Results?

Nova Spivack, during his recent talk about the Semantic Web (covered in my previous post ), made the point that addition of semantic processing to the underlying index for a search engine, make the issue of trust more serious. This was an intriguing statement, and I followed up with him via email to get further clarification; he was kind enough to respond at length. The questions and answers from our exchange are given below.


1. For Semantic web to really take off, the information on the web at large needs to become Semantic Web-compatible; i.e. web pages need to provide semantic information in the form of RDF, OWL etc. Do you see this happening in the forseeable future, given the huge mass of pages that already exist?
    Or is it more likely that technology will have to solve this problem for us, and we'll need to invent algorithms that can interpret currently existing web pages to extract and apply semantic knowledge on top (such as ClearForest Gnosis )?

The DBpedia.org is a good start. Also check out the emerging SPARQL and GRDDL standards at W3C -- they will bring existing data into the RDF world. There is also growing body of RDF already out there in the Dublin Core, FOAF and other ontologies on content on the Web. More will be coming from many big companies, Adobe, Oracle, Yahoo, etc. And of course other startups like Metaweb and Radar Networks will be adding a lot of content to the mix in different ways as well.


2. In your talk, you mentioned Trust - e.g. you referred to Powerset, wondering how they could add some sense of trust to the results they find through semantic processing (NLP) of web pages, because otherwise it would not be useful.
    I'm not sure I understand.

They are mining full text of the web and automatically building a knowledge base from that. So when they see a web page that says "Microsoft is a terrorist organization" or "Microsoft is a software manufacturer" how do they know which statement is true or false? Who do they trust? How do they determine who to trust? This determines what facts or assertions get what level of weight in their knowledge base. It's the crux of the issue really. You can mine in a lot of assertions, but if you have no good way to filter out the garbage, spam, erroneous statements, or deliberate deception, you can't use it for anything real. One solution is to only mine highly trusted sources -- such as encyclopedias and major newspapers for example. That's not a bad way to start. That would generate a decent knowledge base.

But the DBpedia.org might be a better way to start than mining free text. They've already done the heavy lifting of turning the wikipedia into RDF. I'm not sure you need natural language to get good, reasonably trustworthy knowledge, just use the DBpedia.

In the case of Powerset I believe their goals are different than the DBpedia/Wikipedia -- I think they don't just want almanac content, they want specialized vertical knowledge about travel, products, etc. That will require that they either are very selective of their data sources, or they have a sensitive way to measure trust and rank information accordingly.


        a] How does the addition of semantic processing to the underlying index make the issue of trust more serious? Google's PageRank is basically an approximation of the Wisdom of Crowds, using static links to represent votes; if Powerset uses some similar mechanism to rank the information sources in the underlying index, then their results should be no worse than Google's in terms of trust, and better than Google's in terms of relevance. [Incidentally, I've written about Powerset before when I attended their preview event.]

They could perhaps use a pagerank algorithm to attribute more trust to assertions they mine from various sites. That would be one solution. Unfortunately it gets more complex though -- because if their knowledge base has many grades of truth (statements ranked to varying degrees of trust) for a given assertion, then they will have to use modal logic or some other form of fuzzy reasoning to actually do any real reasoning or inferencing. That stuff is hard and uncharted territory to say the least. I don't think they will go there and if they did I think they would not be successful at it. So the question is what can they do without going there?


        b] How will the Semantic Web improve the trust situation for web search results?

First there is the issue of being able to assign a trust rank to each triple. Trust is relative so in fact there may not be a single global measure of trust that applies equally to everyone. I may trust someone that you don't trust and so I may take what they say to have more weight than you would for example. There isn't room within every triple to store that, but triples could be ratified by other triples that express "endorsement" of their content. So if I agree with something I can simply express that and now it is recorded that I (in an authenticated manner) have said I trust it. If lots of people do that with various assertions (triples), records (objects), and sites and people (sources), then we have a network of trust built in RDF. A network of trust can be reasoned on to determine weighted, socially relative trust rankings for triples. It can determine what triples I am likely to trust versus what you are likely to trust versus what everyone is likely to trust.

Second there is the issue of being able to trust reasoning performed by the system. For that the system needs to be able to explain to a human how it reached some logical conclusion and what data it used to do so. Work is being done both on computer-generated explanations and ways to record and show provenance to address these issues.


Many thanks to Nova for his detailed answers. You can find his blog here:  Minding the Planet .



June 19, 2007

A conversation with Guy Kawasaki of Garage Technology Ventures

There is no need to introduce Guy Kawasaki - Evangelist, VC, Blogger and long-time Silicon Valley icon. His latest venture is called Truemors (here's the Software Abstractions review ) - a digg-like social network for headlines and short posts. Recently, I had the opportunity to get his thoughts via email on a variety of topics - our (electronic) conversation is reproduced below.

 

Q - Congratulations on your new startup, Truemors, which seems to be doing very well. What attracted you to this particular idea, out of the hundreds of potential candidate ideas out there?

GK - I was inspired by three events: Listening to James Hong explain how he created HotorNot; SpinVox enabling people to post blog entries via voicemail; and the success of Twitter. I love the concept of the democratization of information, so when I put all four together, I came up with Truemors.

Q - What is your business model for Truemors ?

GK - Hopefully we'll attract enough page views to sell advertising and sponsorships in significant amounts.

Q - How will Truemors differentiate itself from established social headline sites like Digg, Reddit et al?

GK - Digg, Reddit, et al are mechanisms for people to rate content. Truemors is a way to generate content. In an ideal world, people will start Digging our content and drive up our page views.

Q - What do you make of the mostly negative opinions of the blogosphere about Truemors?

GK - First, don't exaggerate. The feedback hasn't been "mostly" negative. It's been totally, unequivocally negative. :-) One of the consequences of the democratization of information is that you have to be able to take the Heat. So my reaction is, We shall see. Maybe they are right. Maybe they are wrong. But you never know unless you try.

Q - You've been an active blogger for a while now. What has your blogging experience been like?

GK - Very, very gratifying. I like working without having to get approval and waiting for publication. "I blog, therefore, I am," is my new mantra. I'm #15 or 16 right on Technorati now. My stated goal when I started blogging was to be in the top ten, so I've been fortunate.

Q - What type of key differentiators do you look for in your investments through Garage.com (now Garage Technology Ventures)?

GK - We're looking for two guys/gals on the West Coast who are trying to change the world with a software, IT, or clean-tech startup. We're a small fund so we don't participate in companies that need $25 million to break even or build a factory.

Q - What are the two or three most influential business books you've read?

GK - If You Want to Write by Brenda Ueland, Influence by Robert Cialdini, and Crossing the Chasm by Geoffrey Moore.

Q - Who are your business heroes?

GK - James Hong of HotorNot and Markus Frind of PlentyofFish. I love how they turned nothing into a big something.

Q - It always amazes me that Apple users - the so-called "Mac faithful" - have displayed such fierce loyalty to Apple through all of the travails and even mis-steps of the company; now the wheel has turned full circle, and Apple is back on top.
What do you think is the "secret sauce" that inspires such passion and loyalty in Apple users?

GK - There's no secret sauce. It's purely engineering. When Apple makes stuff that people like, it does well. When it doesn't, it doesn't.

Q - Do you think the iPhone will    succeed wildly   or    fail miserably ?

GK - It's too early to tell. Basically everyone is conjecturing how good or bad it will be without having used it--unless you're Walt Mossberg and Apple is sucking up to you.

Certainly the first million or so will fly out of the stores, but so did the first 250,000 Macs. Then the hard work really begins: battery life, no keyboard, durability of the screen, ATT's sucky data network, size and weight, lack of Exchange server (I can't believe I'm saying this too!), price point (a mere 20-30x more than a Razr), cancellation fee of your current plan (unless your two-year anniversary happens to be June 29th).

Still, when all is said and done, it's very dumb to bet against Steve.

Q - What message would you give to would-be entrepreneurs out there?

GK - Asking bloggers about anything beyond tools for bloggers is dubious as a feedback mechanism. Whatever the blogosphere says, do the opposite. [Ed: Aw, gee, thanks for the compliment!   Not! ;-)]

Q - In one sentence, what's your core philosophy?

GK - "Empower entrepreneurs."



January 07, 2007

A conversation with John T. Maloney of Colabria

I recently had the opportunity to discuss Prediction Markets with John T. Maloney of Colabria. The KM and Colabria Clusters® are open, federated action/research networks - you can find more about them here  .

[Background about Prediction Markets: Wikipedia reference  , confab.yahoo event on PMs  , Chris Masse vortal]

Here are some excerpts from our conversation:

Q - Let me start with a basic question. Why should companies be interested in Prediction Markets? What's in it for them?

JTM - Better decision making, improved knowledge management, dramatic cost savings via far better forecasting.

Q - So let's say that I'm convinced that my company should set up some PMs to improve prediction accuracy and tease out common (but hidden) knowledge - how do I convince higher-level management of that?

JTM - Well, you need to make a fairly routine business case. This is the heart of the issue, and the main reason for crafting an industry consortium, the PM Cluster  . So far, PMs have been interesting research tools for scholars, academics and corporate research scientists. These particular populations are not equipped to make compelling business cases to management, to create 'best practices', hence the consortium.

Q - How do you quantify the benefits of the Prediction Market?

JTM - I do not assign these sort of qualitative measures, rather, it is just part of the toolkit. Tools are heavily dependent on context. What is accurate in one setting may collapse in another. This has been the problem since time immemorial -- the flawed focus on tools rather than context. The tricky part is finding the right 'contract' or future. The Web based tools are simple, easy. You need to 'find the pain' like anything else.
Examples are Microsoft and their rather bogus release schedules, Intel and where/when to make a new chip foundry or HP on how much memory to buy in a month... These are billion dollar questions.

Q - Once you get a prediction market going - how do you maintain/increase participation?

JTM - Focus on intangible benefits, reputation, experience, outcomes.

Q - Could you expand on that?

JTM - Yes, people are motivated and support what they create, value, which isn't always apparent or measurable. They operate in complex social value networks that are hard to see or understand with a conventional mindset... it is why value network analysis is rising so fast. Prediction markets and intangible value are closely linked but only a few have discovered it yet.

Q - What are the best applications initially, within a company, for a Prediction Market? Sales forecasting? Project planning? Supply chain predictions?

JTM - Yes, yes, yes. Competitive intelligence and feature function analysis are HOT too.

Q - Is 2007 the the year PMs will hit the broader public consciousness? With confab.yahoo, blog articles and so on, on the rise ...

JTM - Note that the The AEI-Brookings Joint Center is sponsoring the worldwide Prediction Markets Conference   in Washington DC on January 18, 2007. This is an important, seminal development.

December 03, 2006

A conversation with Tom Eng, Founder of Healia

I recently had a chance to chat with Tom Eng, Founder and Chairman of Healia. Healia is a consumer health search engine. Here are some excerpts from our conversation:

Q - Tell us about Healia.
[Note: For basic background information, see Healia's About page and this excellent post by David Berkowitz.]

TE - Healia is a consumer search engine for medical information. We use technology and rules in our search engine to enable users to find information more easily. For example, consumers may not know the relationships between commonly used terms and related results - there is a gap in the medical informatics world between popular consumer terms and medical information; our semantic matching algorithms can match the related medication or drug. We have our own customized taxonomy. We also have our own web crawlers and parsers and indexers, and use quality scoring algorithms to provide high quality search results.

Q - Some medical professionals frown upon patients using the web to look up medical information; they question its reliability, and feel that "Health information should only come from medical professionals!". How do you address that sentiment?

TE - That is an old way of thinking. Consumers today, in any interaction, are no longer passive - they want to drive it themselves. Studies have shown that people turn to the web first for medical information. Even after consulting their primary physician, they feel better after validating the information from other sources, of which the web is a primary example. There has been a major change in attitudes in the last five years, and Healthcare is catching up rapidly in this context. So we don't see this as continuing to be an issue in the future.

Q - Google OneBox is an attempt to provide vertical search functionality from within a Google web search. How do you think this feature will affect you, and Vertical Search Engines in general?

TE - OneBox and Google Co-op is an improvement, certainly. But using Google for a medical information search - to find out about diseases, symptoms, drugs or medical terms - has its limitations.
First, [Google] results are classified based on voluntary self-tagging by the publisher, which has inherent problems: people gaming the system, built-in bias, how many people are able to do this. This approach is not scalable for less-common diseases and conditions. In contrast, Healia uses algorithmic tagging - we use our technology to automatically add tags and relationships to any collection of content. We also use quality scoring algorithms for spam prevention.
Second, we provide a series of filters that allow a user to focus on results that apply to their unique situation, by enabling post-filtering of search results: by user demographics - such as age, sex or heritage, by the type of article - basic or advanced, and so on; this is especially important in a medical context, since symptoms and drugs for a given disease can vary based on patient attributes.
Third, since we apply a semantic network to the content and understand the relationships between commonly used search terms and related results, we can guide the consumer to other related searches or appropriate sources.

Q - Now that Google has released technology to build custom search engines based on their web crawlers, do you see a big threat from third-party content publishers building their own trusted-source search engines or providing canned searches?

TE - Well, Rollyo and Yahoo! Search have provided this type of technology for a while (although Google got more press), and we haven't seen a lot of that yet. A Google-based custom search engine would suffer from many of the same limitations described above. Another issue is the requirement to include AdSense advertising in the search results. Also, this gets the publisher into a field [search] in which they're not experts - they might choose to partner with someone like us instead, with semantic technology, post-search filtering, etc., to guide users and make sure that they can easily find the information they're looking for.

Q - Plus, you may be able to provide related services based on the results or other features specific to the medical domain, something not everyone can do.

TE - That's something we're looking into. Watch this space ...

Q - One word: Powerset. What do you think about the sudden interest in natural-language search?

TE - We already have natural language associations embedded into the semantic technology we use. Natural language for queries - it's been tried before a number of times, so that in itself is not new. I haven't seen the Powerset search engine yet. Let's see what it is and what happens, when it becomes available.

Q - To drop into cliches for a moment: Does Healia serve the Big Head or the Long Tail of Search queries?

TE - I want Healia to be the place for consumers to find the medical information they need. Having said that, a major part of our current business model is to work with partners to provide services. It depends on which part you tackle - we're really only in the beginning stages of our industry. Healthcare is a really important domain for search, possibly the most important kind of search in an individual's life. With high quality and personalization of search results, we're focused on using our technology to help people, so it's more than just a business.

Q - Do you provide an API?
[Note: Healia also provides a widget for web site publishers, available here .]

TE - We do have a web services platform, and an API. At this point, it's not generally available; it's licensed to partners. We have several big customers already using this API, for example, VA (Veterans Administration) uses our WS platform.

Q - What do you see for the future of Vertical Search Engines? Any specific trends?

TE - We're really at the 1.0 stage for Internet Search - what we've seen so far is just the beginning. When Alta Vista was the leading search engine, people thought that was the endgame for search, then Google came from nowhere and changed everything. I think we will see a shift, especially now with Vista coming out, where users will simply search from within the context of wherever they are, rather than going to a specific site to perform a search. People will default to the easiest thing to do; for example, if you are on a financial news site, and if that site offered a vertical search engine that meets your needs, then you might just search right there. If you had an embedded search within your application, you could search from within the app. So the future looks bright for context-specific, personalized search - we will just enable you to search from wherever you are. It's difficult to imagine one provider who does this for everyone. In a sense, Internet search is almost too important a capability to be left to any one company. Providing the underlying search technology as a service, with a high quality of results, will become increasingly important in the future.
Healia is working with partners to provide high quality, personalized search within clients and applications - watch for some exciting announcements and new features in the next couple of quarters.

  • Search This Blog


    Web This Blog