« Two Emerging Memes: Freenomics and Crowdsourcing | Main | Could You Survive For A Day - Without Google? »

March 20, 2008

Tim O'Reilly and Sir Tim Berners-Lee concur: Semantic Web Likely to be Top-Down

In a previous post, I asked the question: Where are the Meaning-Enabled Authoring Tools?, arguing that publishers who regularly post similar content (especially content that conforms to common formats) would get a big advantage from using Semantic Authoring tools for creating new content. By using semantic tools, not only can you get SEO benefits and improve findability , the content can more easily be re-purposed for other uses such as web applications and services.

This is essentially a bottom-up approach to the semantic web: adding semantic notation to the content itself. However, as the post went on to say, the prevailing view is definitely a top-down one, viz. that semantic meaning will have to be extracted by applications from perfectly ordinary web pages, and that the adding of semantic knowledge to the content itself is unlikely (aside from very limited contexts, such as Microformats).

Two recent podcasts with two of the leading voices in this space further confirm this view.

In a conversation with Paul Miller of ZDNet's Semantic Web blog, Sir Tim Berners-Lee said that this idea - that the Semantic Web involves users marking up web pages with semantic information - is only a minor part of it; the data will mostly come from databases, or will be scraped from HTML. (Earlier coverage.)

Recently, my friend Aaron Strout asked the same question to Tim O'Reilly during one of his We Are Smarter podcasts  (time index: 19:36, if you want to check it out), who said essentially the same thing: although useful in certain contexts, semantic markup in the content is unlikely, even by publishers, and a top-down approach of extracting meaning out of regular documents is likely to prevail.

So there you have it folks! If you've been holding your breath waiting for semantic markup tools to appear, you can go home now. And if you're working on an application to extract meaning - such as AdaptiveBlue or twine - you're on the right track!



TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/1041896/27311006

Listed below are links to weblogs that reference Tim O'Reilly and Sir Tim Berners-Lee concur: Semantic Web Likely to be Top-Down:

Comments

Nitin, I don't really buy into this top-down/bottom-up Semantic Web distinction; more on that some other time. Either way though, I'd argue that you've misinterpreted TimBL's comments.

Tim is absolutely right that hand-crafted markup is unlikely to be the major source of Semantic Web data, but to interpret his comments as evidence that the Semantic Web will be "top-down" misses the point.

Much of the data will come from databases, but who will make this available if not the publishers themselves, who currently publish it in plain old HTML? They're not going to mark up all this data by hand; they'll roll their own scripts or use something like D2R Server (http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/). Surely this counts as "bottom-up" Semantic Web.

I haven't listened to the Tim O'Reilly podcast, but based on how you report it the conclusion I draw is that the two Tims actually disagree. TimBL says something akin to "semantic markup from data publishers, pulled from their databases", whilst TimO says this is unlikely.

Post a comment

If you have a TypeKey or TypePad account, please Sign In

  • Search This Blog


    Web This Blog