Jason Kolb sees it as a way to identify data objects using URIs. John Markoff, of the New York Times, calls it Web 3.0 . And Nova Spivack has a long post clarifying what it is Not.
What are all these authors talking about? The Semantic Web - much has been written recently about its concepts, approaches and applications. But there's something missing, a piece that hasn't generated much interest to date.
In terms of understanding, finding and displaying content, there is no doubt that the Semantic Web is slowly becoming real (e.g. there were some great demos at a recent SDForum meet ). However, a gap is emerging with Content Authoring tools, which have not yet made this paradigm shift.
On the one hand, most authors are comfortable with, and proficient in, desktop authoring tools, such as Microsoft Word, FrontPage, Adobe GoLive and others; this is especially true for professionals and other experts who create technical reference content for web applications, such as legal references, accounting manuals or engineering documents. The current crop of authoring tools produce visually high-quality articles and web pages, but their XML or RDF creation capabilities are severely limited.
On the other hand, parsing Word documents or HTML web pages to extract meaningful structure out of them, gives poor results; much of the semantic knowledge of the content is lost. There do not appear to be any popular tools that create Semantic content natively and yet are natural and easy for a content author to use.
Top-Down? Or Bottom-Up?
Of course, there are ways to get around this issue to some extent. Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way - a la Spock, twine and Powerset (see writeup ) - but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors' hands in the first place, extracting the semantic meaning would be so much easier.
For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records - say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.
Ideally, the authors would create the content as meaningful XML text or RDF triples, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create information in this way, and yet are natural and easy for authors to use, don't appear to be on their way; and the creation of a custom tool for each individual domain, seems a difficult and expensive proposition.
Car Review Example
As a more concrete example: imagine that you control a web site called New-Car-Reviews.com, a hypothetical site that reviews new cars; you pay expert authors to write reviews of new car models every year for this site. Unlike other automobile characteristics, reviews cannot be easily stored into a database and queried. Conceptually, your reviews are similar to this review for the 2008 Volvo S40 2.4i sedan on the automotive site Kelley Blue Book.
In the current paradigm, a typical element of the review is usually written something like this:
<span id="ctl00">You'll Like This Car If...</span>
...description_positive...
<span id="ctl00">You May Not Like This Car If...</span>
...description_negative...
For
the future, imagine this: when your authors are originally composing
this review, what if they could instead create it with semantic markup
embedded:
(In this example, I use straight XML for simplicity; the actual format
of the content could be RDF-triples, or some other improved format)
<advantages>You'll Like This Car If...
<text>...description_positive...<text>
</advantages>
<disadvantages>You May Not Like This Car If...
<text>...description_negative...<text>
</disadvantages>
then you can get more value out out of the same content:
(a) You can easily *re-purpose* the content in additional ways, such
as for mobile devices, RSS feeds, web services APIs, mashups and so on
(b) As search engines start to take advantage of semantic notation, you get SEO benefits
(c) You can provide users with ways to query the content
*intelligently* ("show me cars which are family-friendly AND don't roll
over easily vs those that work better off-road AND seat 7"), using tools such as the recently-released SPARQL .
As a content publisher, you want your content to be found and used as much as possible, and making it meaning-enabled is a big step in this direction. At the same time, you cannot ask authors to use a pure XML tool such as XMLSpy or an ontology editor like protégé; and MS Word creates unreadable XML that specifies formatting rather than semantics.
A solution for this specific example already exists: Microformats could be applied to handle the problem of annotating the advantages and disadvantages. While the Microformat solution works very well for specific types of information - such as for describing people and addresses - it is too limited to be applicable in a general way to add semantic information to web content at large.
It seems to me that the general problem must be solved if we are to see large-scale adoption of the Semantic Web. It would be a boon to expert authors everywhere, including those who create news articles for the newspaper publishing industry. But there do not seem to be any solutions on the horizon, in terms of technologies, tools or processes to promote the creation of more meaning-rich content.
Reactions: But is there a Business Case?
When I put this question to a group of prominent bloggers and industry thought-leaders in the Semantic Web space, the results were not encouraging. There does not seem to be much interest in building Semantic authoring tools. The main stumbling block is the lack of a clear business model for publishers to embrace this approach.
Jeremy Liew of Lightspeed Venture Partners, has recently penned a series of articles focused on Semantic Web: Meaning = Data + Structure , based on user-generated structure, domain knowledge and user behavior , which focus on the problem of inferring meaning from content.
He questions the business rationale for authors to take the effort to add XML markup to their content, and points to domain-specific extraction approaches as the more likely solution:
The challenge with getting most authors to markup in XML is not just one of tools, but also of motivation IMO. Unless and until a clear business case advantage justifies the additional effort required, and that advantage is greater than other projects offer, you won't see much semantic markup except from academics and others whose interests are more philosophically driven than business driven.
That is why I think the domain specific extraction approaches will likely be more prevalent - the business advantage of better search and structure accrues to the person doing the extraction, and because it is domain specific, the additional effort is lessened
He's right, of course; domain-specific extraction approaches are definitely going to be popular, and are beginning to take off already. It provides significant added value for the extractor. However, it's difficult and expensive to do it well, so the business case is somewhat dubious for the early adopters.
ReadWriteWeb's Alex Iskold is another thought leader in this space. He has a series of fantastic articles about the Semantic Web, including the problem of annotating data, the different approaches used, and a primer for the structured web.
His comments echoed those of Liew:
There seems to be little incentive for publishers to annotate information.
The problem is that if you go deep enough you hit RDF. The light version is Microformats. But the issue is not the format, its the incentive.
Tim O'Reilly wrote about this issue almost a year ago: Different Approaches to the Semantic Web , in which he echoes the same sentiment:
It seems easy enough, but why hasn't this approach taken off? Because there's no immediate benefit to the user. He or she has to be committed to the goal of building hidden structure into the data. It's an extra task, undertaken for the benefit of others. And as I've written before, one of the secrets of success in Web 2.0 is to harness self-interest, not volunteerism, in a natural "architecture of participation."
Conclusion
I guess I'm a minority of one. It seems to me that if content creators could add semantic meaning while constructing the content in the first place (which is, conceptually, only marginally more difficult for the authors), then the value of the content would increase exponentially at very low cost. That seems like a defensible business case for content publishers.
The business case for publishers to annotate existing web pages and content is certainly very weak. But for new content, if you're creating it for your site anyway, why wouldn't you add semantic markup to make it more findable and usable?
What do you think? Please leave a comment below or email the author (removing the ".aa" at the end) and let us know!
I loved this post and comments on Read/WriteWeb, so I tried to make a french translation of it. It's there on Innovablog : http://innovablog.com/analyse/web-semantique-outils-creation-contenu-riche-de-sens/
Thank's for this work !
Posted by: Olivier | February 25, 2008 at 12:54 AM
Nitin,
This is a fantastic post (sorry, I just now got around to reading it) and deserves a better response than I can fit into a comment. You've resurrected quite a few thoughts I had around Live Clipboard, perhaps that idea needs to be revisited by the SemWeb crowd.
In any case, I need to go write a post about this topic :)
Posted by: Jason Kolb | February 27, 2008 at 10:09 AM
I think there are many points that are in discussion about such issues and there will always be that regardless of whether they are unable to adapt to the different changes over time are needed .. surges especially in sports
Posted by: viagra online | September 30, 2010 at 08:32 AM
Infrared heaters are a great way to warm individuals or items no matter if they are inside or outside. Infrared heaters warm objects rather than entire rooms...
infrared heaters
Posted by: infrared heaters | February 27, 2011 at 10:13 PM
There is no question that instantaneous water heaters are good options to trust in terms of their water heating services. However, you need to know that not all kinds of this water heater will work best for you.
Posted by: pandora bracelets sale | June 11, 2011 at 01:31 AM
It is not the human who each brushed past can be acquainted with one another, was also not each acquaintance's people lets the human worry. We in this life, in that place, when in turns around, you had said I appear in the wrong time, but cannot forget that sincere sentiment. Some section of sentiments again are only impossible to continue, some person cannot walk arm in arm again, some sound cannot again resounds nearby the ear, some pair of hand also cannot grasp that control temperature again.
Posted by: Nike shox deliver | July 07, 2011 at 12:53 AM
meglio dei sandali con stringe più sottili e caviglia libera Perfect!
Posted by: tiffany sale | August 10, 2011 at 10:30 PM
The personal loans are useful for people, which would like to start their own organization. As a fact, it is comfortable to receive a car loan.
Posted by: Myers26Kitty | August 30, 2011 at 10:34 PM
These articles written too great,they rich contents ma le scarpe non ti donano. meglio dei sandali con stringe più sottili e caviglia! and data accurately.they are help to me.I expect to see your new share.
Posted by: Canada Goose Montebello CG55 | September 21, 2011 at 04:15 AM
we can provide you with high quality Ugg Boots ,just take action now do not miss it!
Posted by: Evering2010 | October 07, 2011 at 12:52 AM
I loved it!
Thank you for your good post,I really liked reading it
Posted by: cheap mlb jerseys | November 14, 2011 at 10:02 PM
I very light given you of the article, very good, top, you a
Posted by: cheap clothes | November 28, 2011 at 06:51 PM
"Bad, you exactly how many secret!" Impatiens beauty to love a joyous people, the more see more will claw. Of kind of the bad guy is likable, always have so many girls around concave!!!! Look after yourself as the fallen petal that, night lure him.
Posted by: cocktail dresses | December 01, 2011 at 11:27 PM
"You don't have money, but not take one million buy my reward store content ring?"
Posted by: cocktail dresses | December 09, 2011 at 12:00 AM