furialog

¶ This Sunken Summit · 18 November 2010 tech

At one point during Semantic Web Summit East, yesterday, Lee Feigenbaum summarized the advantage of new Semantic Web tech over old database tech as, compactly, "DB owned by IT, ontologies by domain experts, who can quickly affect the changes they need". This theme reverberated through about half of the two-day conference**. Only the Semantic Web can deliver the agility demanded by modern enterprises! The whole event was, essentially, a pep rally for this conviction. I was there myself, more or less, to cheer along.

And after two days of that, I don't think I feel like cheering any more.

Lee's summary gets at exactly the source of the problem, I think. The supposed inflexibility of old database tech isn't due to technical issues. Adding tables to a relational database is easy. Changing tables and migrating data is easy enough. Database people have been doing this for decades. IT departments aren't change-averse as a matter of arbitrary mean-spiritedness. Enterprise database ecosystems become intransigent not because the technology is clunky (although it is), but because in a real enterprise data environment, the dependencies compound. And thus the domain experts are left "affecting" change instead of effecting it.

But "ontologies" don't change this. A relational database schema is an "ontology", too. You make decisions about how your data is structured, and then you build your world around that. And if you change your mind about your underlying worldview, you'll have to remodel and rebuild. You can improve your ability to adapt by trying to anticipate this possibility from the beginning, but you could do that with old tech, too. Sometimes it works. But it's hard, and the more eventualities you have to try to design for, the longer it takes.

So here, I think, are the three real truths underlying this non-truth.

First, the new tech is undeniably, however superficially, new. Sometimes newness is an excuse in itself. If you get to rebuild an existing data flow from scratch, you can probably do a better job of it, no matter what tools you use.

Second, the new tech believes in its newness. If you get to rebuild an existing data flow from scratch, and you come armed with a belief that things can be better this time, they probably will be. I walk differently in my new shoes, too.

Third, the new tech is fabulously, pitifully, critically immature. "How do I get started?", somebody asked the conference's closing panel. "If you want to write, take out a pen", one of them said. But in the Semantic Web world, you don't get a pen, you get a spec for kits for pens. There is no MyRDF like there's a MySQL, and even more glaringly, there is nothing for linked data like there is Excel for columns of numbers.

Until we understand these things, "summit" is the wrong word for where we're standing.

But even more than that, sitting in this conference for two days I've been edging towards the conclusion that the foundation of the new pile of standards and tools isn't even meaningfully new. RDF isn't a new alternative to the relational model, it's just another way of writing it down. Where the classical way is to make tables and fill them with rows, RDF takes the rows out of the tables and lists them individually. There is no significant increase or decrease in overall expressive power, and you still have to have a model in order to do anything useful with the data. If anything, RDF makes the modeling problem slightly harder in practical terms, because in a mostly-relational database with denormalized rows the denormalization keeps the data from collapsing in certain ways. In RDF you actually have to get it right.

What RDF gains, in the course of taking rows out of tables, is a mechanism for using URLs as row IDs. But this isn't a consequence of changing the serialization, it's just a thing in itself. If you want to use URLs as row IDs in a relational database, there's nothing stopping you. If you want to serve up database rows in response to URL requests, there's nothing stopping you. If you want to use URLs as column names, there's nothing stopping you. If you want to expose an SQL endpoint so people can try their own queries on your data, you can. It's a pretty good idea.

What RDF incurs, in the course of asserting this URL point as dogma, is an astonishing syntactic bloat and an expansive mania for precision with no intrinsic connection to accuracy. Yes, now everything has URLs. Everything. Sometimes that helps us, but mostly it means we give up the powerful imprecision of words. Before, we might have wondered whether two fields with the same name were really the same. Give them URLs and we'll think we know. But we're no more likely to be right.

And, in fact, RDF doesn't even deliver what it claims. Not everything gets URLs. The obtuse distinction between "nodes" and "values", which was arguably the only thing that needed to be hammered out of the explication of the relational model, persists. So sometimes the thing we want will have a URL, and sometimes it won't, depending on decisions somebody else made without our needs necessarily in mind.

Of course, I already wasn't using RDF anyway. I say "we" out of some residual conviction that these sigils, "Semantic Web" and "Linked Data" and whatever, identify a society of the dissatisfied, and that our dissatisfaction lies with the level of abstraction at which humans are required to interact with data. I never cared about the serialization, I cared that I shouldn't have to care about it. For decades computer people have officiously taken data apart for the convenience of computers. I believe we want to use computers to put it extravagantly back together again for the benefit of humans.

And I want that to be a we. It's a big task. I want company. I'm sure I'm not the only person who believes this, and I know I'm not the only one trying to make it happen. I want this to be my community. But I also know that it takes more to join my cause than a flag of affect, flown from a sunken hill on the way to somewhere we've already been.

** The other half or so of the conference was devoted to the problem of having machines read human text. This is an interesting but essentially unrelated pursuit that happens to share the word "semantic", and combining the two makes barely more sense than holding a Things Having Risen convocation of second-coming evangelists and sourdough bakers.