furia furialog · New Particles · The War Against Silence · Aedliga (songs) · photography · code · other things     ↑vF
There are no dead-ends in data. Everything connects to something, and if anybody tells you otherwise, you should suspect them of hoping you won't figure out the connection they've omitted.  

But we have suffered, for most of humanity's life with data, without good tools for really recognizing the connectivity of all things. We have cut down trees of wood and used them to make trees of data. Trees are full of dead-ends, of narrower and narrower branches. Books are mostly trees. Documents are usually trees. Spreadsheets tend to be trees. Speeches are trees. Trees say "We built 5 solar-power plants this year", and you either trust them, or else you break off the end of the branch and go looking for some other tree this branch could have come from. Trees are ways of telling stories that yearn constantly to end, of telling stories you can circumscribe, and saw through, and burn.  

And stories you can cut down and burn are Evil's favorite medium. Selective partial information is ignorance's fastest friend. They built 5 solar plants, they say. Is that right? And how big were they? And where? And why did they say "we built", past-tense, not "we are now running"? And how many of last year's 5 did they close this year? And how many shoddy coal plants did they bolt together elsewhere, while the PR people were shining the sun in our eyes? There are statements, and then there are facts, and then there is Truth; and Truth is always tied up in the connections.  

Not that we haven't ever tried to fix this, of course. Indices help. Footnotes help. Dictionaries and encyclopedias and catalogs help. Librarians help. Archivists and critics and contrarians and journalists help. Anything helps that lets assertions carry their context, and makes conclusions act always also as beginnings. Human diligence can weave the branches back together a little, knit the trees back into a semblance of the original web of knowledge. But it takes so much effort just to keep from losing what we already knew, effort stolen from time to learn new things, from making connections we didn't already throw away.  

The Web helps, too, by giving us in some big ways the best tools for connection that we've ever had. Now your assertions can be packaged with their context, at least loosely and sometimes, if you make the effort. Now unsupported conclusions can be, if nothing else, terms for the next Google or Wikipedia search. This is more than we had before. It is a little harder for Evil to hide now, harder to lie and get away with it, harder to control the angle from which you don't see the half-truth's frayed ends.  

But these are all still ultimately tenuous triumphs of constant human vigilance. The machines don't care what we say. The machines do not fact-check or cross-reference, of their own volition, and only barely help us when we try to do the work ourselves. The Web ought to be a web, a graph, but mostly it's just more trees. Mostly, any direction you crawl, you keep ending up on the narrowest branches, listening for the crack. All the paths of Truth may exist somewhere, but that doesn't mean you can follow them from any particular here to any specific there.  

And even if linking were thoroughly ubiquitous, and most of the Web weren't SQL dumps occasionally fogged in by tag clouds, this would still be far from enough. The links alone are nowhere near enough, and believing they are is selling out this revolution before it has deposed anything, before it has done much more than make some posters. It is not enough for individual assertions to carry their context. It is not enough for our vocabulary of connection to be reduced to "see also", even if that temporarily seems like an expansion. It is not enough to link the self-aggrandizing press-release about solar plants to the company's web site, and hope you can find their SEC filings under Investor Relations somewhere. It is not enough to link the press-release to the filings, or for your blog-post about their operations in China to make Digg for six hours, or to take down one company or expose one lie. We've built a system that fountains half-truths at an unprecedented speed, and it is nowhere near enough to complete the half-truths one at a time.  

The real revolution in information consists of two fundamental changes, neither of which have really begun yet in anything like the pervasive way they must:  

1. The standard tools and methods for representing and presenting information must understand that everything connects, that "information" is mostly, or maybe exactly, those connections. As easy as it once became to print a document, and easier than it has become to put up web-pages and query-forms and database results-lists, it must become to describe and create and share and augment sets of data in which every connection, from every point in every direction, is inherently present and plainly evident. Not better tools for making links, tools that understand that the links are already inextricably everywhere.  

2. The standard tools for exploring and consuming and analyzing connected information must move far beyond dealing with the connections one at a time. It is not enough to look up the company that built those plants. It is not enough to look up each of their yearly financial reports, one by one, for however many years you have patience to click. It's time to let the machines actually help us. They've been sitting around mostly wasting their time ever since toasters started flying, and we can't afford that any more. We need to be able to ask "What are the breakdowns of spending by plant-type for all companies that have built solar plants?", and have the machines go do all the clicking and collating and collecting. Otherwise our fancy digital web-pages might as well be illuminated manuscripts in bibliographers' crypts for all the good they do us. Linked pages must give way to linked data even more sweepingly and transformationally than shelved documents have given way to linked pages.  
 

And because we can't afford to wait until machines learn to understand human languages, we will have to begin by speaking to them like machines, like we aren't just hoping they'll magically become us. We will have to shift some of our attention, at least some of us some of the time, from writing sentences to binding fields to actually modeling data, and to modeling the tools for modeling data. From Google to Wikipedia to Freebase, from search terms to query languages to exploration languages, from multimedia to interactive to semantic, from commerce to community to evolving insight. We have not freed ourselves from the tyranny of expertise, we've freed expertise from the obscurity of stacks. Escaping from trees is not escaping from structure, it is freeing structure. It is bringing alive what has been petrified.  

There are no dead-ends in knowledge. Everything we know connects, by definition. We connect it by knowing. We connect. This is what we do, and thus what we must do better, and what we must train and allow our tools to help us do, and the only way Truth ever defeats Evil. Connecting matters. Truths, tools, links, schemata, graph alignment, ontology, semantics, inference: these things matter. The internet matters. This is why the internet matters.
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.