furialog

¶ Semantic Bran · 20 April 2008 tech

The core of the "semantic web" idea, at least as far as I'm concerned, is that we're trying to do for data what the first web did for pages. We're trying to make dataspaces, both individual and aggregate, that can be explored and analyzed both by people directly, and by machines on our behalf. The people half of this, at least, is not mysterious or obscure or even speculative. It looks like IMDb, or any other site where there's pretty much a page for each individual thing, and you can click your way from every thing to everything else.

The machine part is more complicated, but only by a little. Instead of regular old-web links, which just tell the computer where to go, a "semantic" link also says what it means to go there. So the old-web page for Rush Hour 3 links to Jackie Chan and Chris Tucker, but also to ads and the IMDb front page and job-listings for IMDb.com, and as far as the machines can tell, these links are all essentially equal. When IMDb gets their act from web 2.0 to 3.0, the links will be annotated so that the ones that go to Jackie and Chris and the other cast members are labeled "actor", and the other links aren't, and then you can ask a question less like "What web pages mention the words 'Jackie' and 'Chan' and 'older'?" and more like "How many people in that movie were older than him, anyway?", and the machines might have enough material to figure it out for you.

And that, and not coruscating pie-charts, is how you'll start to recognize the pieces of the new web as it begins to emerge: its sites will help you get real answers to real questions without you having to get out scratch-paper and click a hundred links yourself. The more time you spend thinking about this idea, I believe, the more revolutionary you'll realize it is. In terms of how computers augment human capacities for understand information, the jump from the regular web to the semantic web will be a bigger deal than the jump from magazines and books and newspapers to the web. Maybe bigger by a lot.

Which is why I was excited to finally get an invitation to the private beta program for Twine, despite basically not knowing what it was. My wildly hopeful guess, from the pre-release hints about "personal information", had been that Twine might be the long-awaited reincarnation of the soul of Lotus Agenda, a personal information management program in a world where a lot more people now have enough information piling up around them for "managing" it to be a generalizable problem.

Twine, it turns out, at least so far, is a social bookmarking application. Bookmarking is not exactly what I meant by information mangement, any more than daytimer+contacts is what I meant by it in 1992. I gather that there is semantic-web technology behind Twine, somewhere, and I think this is supposed to make the "other tags" Twine recommends for your bookmarks better than the other tags del.icio.us recommends, or the other feeds Google Reader recommends, or the microwave that Amazon tells you was purchased by other people who pre-ordered a Douglas Coupland novel. Or it's supposed to eventually make this true, anyway, some day when/if there are more bookmarks and more people in Twine, which is after all still "in beta", which means that you're supposed to imagine that it will eventually get smart about everything it's currently dumb about.

And in Twine's case, this might eventually make it a really good social bookmarking application. If so, I will happily switch from del.icio.us to Twine for my minimal and basically expendable social-bookmarking needs.

But as an ambassador for the Semantic Web, Twine is an embarrassment. Or, maybe more accurately, it's embarrassed. It buries its semantic-web-ness inside, like it's the information-technology version of oat bran, and the reaction they're going for is "Oh, these donuts taste so good you'd barely know they had any Semantic Web in them!" But oat bran doesn't keep donuts from being junk food, and RDF-storage and named-entity-extraction doesn't make social-booking any less page-oriented.

And I probably wouldn't care if Nova hadn't set up so much semantic-web context around himself and his company and their product. But we've collectively screwed up the presentation of this simple idea about how the next web will be better, somehow, and a lot of people have become convinced that the semantic web is some kind of clanking information C-3PO from an idiot-fantasy future, complaining about etiquette and waddling like it has a Commodore 64 wedged up its ass. So for a little while, at least, anybody working on the tools for building the new web is automatically an apologist learning how to be an evangelist instead. So I want everything that says Semantic Web on it to point clearly to the way the future is really and simply better. I don't want it to look like NLP alchemy, or like temperamental magic someone is trying to use in place of levers or pulleys or Perl. And I especially don't want it to look like some old thing that most people already didn't need.

But then, this is the standard I will be held to, too, if we manage to build and ship the semantic-web application I'm working on. I want to be part of the way the world gets better, and to do something that is not embarrassed of the future it is helping to build. We'll see.