furia furialog · Every Noise at Once · New Particles · The War Against Silence · Aedliga (songs) · photography · other things
19 May 2005 to 21 April 2005
It is becoming increasingly possible for separate systems to perform each of the six major functions of data applications: storage, transformation (including creation), categorization (including tagging, indexing, search and retrieval), visualization, monitoring and the administration of trust.  

Historically, of course, these functions were usually not only performed by a single unified system, but mostly limited to that system. At its most insular, old-world applications entailed embedded storage in proprietary formats, integrated authoring tools and UI, integrated (if any) notification, and integrated (if any) user-management.  

In the old world of personal data applications, like spreadsheets and word-processors and whatever, standardized file systems at least separated storage and categorization from application logic (you could put your Excel file on a floppy disk, or in your "Budgets" folder). Semi-standardization of formats helped open data transformation and/or visualization a little bit (you could use Word's search-and-replace tools on an RTF file, or run Crystal Reports on your Paradox databases), but published formats are not quite the same thing as open formats. And monitoring and trust were usually expendable for personal applications, or solvable a function at a time.  

Old-world online applications changed the distribution of insularities. You could actually use several different tools to send, receive and monitor CompuServe data. Prodigy let you use any tool you wanted, as long as it was a construction-paper hammer designed by runt warthogs for use by cartoon infants. But the online service very clearly owned the physical storage, the content space and the identity space.  

The early web was a combination of progress and regress, pretty much no matter which directions you think those go. HTML offered the tantalizing prospect of separating the presentation logic from the data structures, but in practice browser convergence quickly resulted in this being true in only a somewhat obscure development-tools sense. You could produce your HTML files with different software than they would be read with, but you still had to take the client constraints heavily into account. HTML files could be moved around fairly easily, but cross-server transclusion got pretty ugly the moment you tried to move much beyond linking. And identity management was reinvented from scratch anywhere it was unavoidable.  

But we now have at least nominal rudimentary pieces of the ability to separate all of these. XML offers an interchangeable application-neutral storage format (or at least a meta-format), XML+HTTP gives us a way to virtualize storage (as long as we don't virtualize it too far), Google has demonstrated the scalable separation of categorization and some amount of visualization, and RSS is at least a conceptual step towards the separation of tracking. LDAP separates identity management for at least certain kinds of communities. These may not be the solutions, but they are indications that the solutions are possible and closer.  

But the next steps, in all cases, are huge, and at least as difficult culturally as they will be technically.  

Storage  

All systems must be prepared to handle external storage transparently to the data's owner, whether this actually means live reading and writing over the network or caching and mirroring to simulate it. An indexer must be able to hand you back the indexes it makes and updates, an image organizer must allow you to store the images on your own server, etc.  

Transformation  

All data must be stored in as neutral and open a format as possible. Application-neutral information must be tagged in standard self-describing ways. Proprietary information is acceptable only when mandated by definition (for internal security functions and precious little else), and where necessary must be clearly identified and attributed. These will be practical imperatives, not just moral ones. Secrecy is fragile, and the net routes around it instinctively.  

Categorization  

Anything that exists can be categorized. In many cases, the categorization will end up being qualitatively more valuable than the original information. The only difference between data and meta-data is that meta-data is data the owner of the thing didn't anticipate or provide for. The more fluidly a system can re-integrate the meta-data it spawns, the more powerful it will be. The more afraid you are of your audience, the faster they will depart.  

Visualization  

Similarly, the more readily a system opens itself to external visualization, the better off it will be. Whatever it is you own and control, it's never more than part of the experience. The default techno-social goal of a data application is to be the reference source for some kind of data. (The default business goal is to have some way to make money, not from that data but from that status.)  

Monitoring  

Various malformed and over-constrained attempts have been made to generalize the problems of monitoring, change tracking and notification into email, IM, RSS, Trackback, Konfabulator, Dashboard and countless proprietary and special-purpose schemes. The next generation has to supply a version that scales to the entire world, including not only its size and its bandwidth but also its heterogeneity and its self-organization. The new system has to rationalize all flows, including the malevolent ones.  

Trust  

Ultimately, though, the native currency of the new connected world will be trust. Every interaction of people and systems relies on it, usually inherently and implicitly. Existing systems have mostly survived on trust by exclusivity (firewalls, closed networks, internal identity management) obscurity (mostly self-selection) or informal accountability (feedback and self-policing). None of these scale. The new identity systems must be built not to administer specific applications but to provide universal credentials that verify a user's membership in defined communities. The new data systems must be built so that unknown individuals can be accepted on the basis of delegated authority. In the old world people were "users", users existed inside careful boundaries, and outside of those boundaries all there were were names. In the new world, people are the signals themselves, and a name is a name only by virtue of some authority, and maybe that authority by virtue of another one. In the new data world, where the scope of the network is as big as the scope of the planet, and the size is exponentially larger, the primary component of every transaction of storage, transformation, categorization, visualization or monitoring will be the intimate initialization of the basis of trust under which any two of us say anything to each other at all.
I hereby open a public discussion forum called, unhelpfully, vF. I wrote the software for it myself, mostly because I was curious to see what that was like, and now I'm opening it because I'm curious to see what that is like.  

If you have something you want to talk about, and nowhere else you'd rather talk about it, you are now welcome to talk about it there.
The indulgent, underworked or technically omnicurious among you can help me with a little experiment by going to vLog, a blank anonymous public blog (if it's correct to call something a blog when it doesn't have any way of entering links...) and contributing whatever random comment occurs to you.  

I am testing a prototype dual-hash browser/email/browser-round-trip verification system for public commenting with no persistent server-side user-management or user-side server-management.
Sometimes it's faster to invent a wheel than to shop for one.

New England Mobile Book Fair
Magnum: It's Time to Come Together (2.1M mp3)
Garnet Crow: picture of world (1.9M mp3)  

This is probably an exercise in aesthetic futility. Both these bands are obdurate perfectionists, and the songs I love most fiercely are often the most subtly ingenious variations on elaborately established themes. These are rituals of context, and stripped and isolated they are inexplicable and ordinary. But "ordinary" is exactly where our desperately sensationalist culture so badly lets us down. We live for ever shorter and more transient moments, and then wonder why nothing seems to last.
An abacus is a state machine. It executes no instructions, and maintains no history, but it does store a single state, semi-persistently and nearly-infinitely rewritably, and it stores it in a representation that facilitates operator-initiated state-changes of certain types. An electric typewriter with a single-character backspace function is approximately equivalent in computational terms. Both of these are very useful devices.  

A typewriter with a multiple-character backspace function has both state and memory. The simplest electric calculator has both state and automated instruction execution. A semi-modern calculator has state, instruction and memory, and at this point we can call it a basic computer. The subsequent history of human-computer interaction design has been a slow process of iteratively transcending decreasingly unimaginative understandings of the implications of state, instruction and memory.  

The conceptual breakthroughs of the earliest text-processing programs were 1) that semantically non-numeric information could be represented in numeric memory, 2) that semantically non-mathematical operations could be modeled in mathematical instructions, and 3) that quantitative increases in memory capacity could enable qualitatively different uses of that memory. Further thought about representation led to storing formatting in addition to text itself. Further thought about instructions led to the automation of layout operations, and the addition of text-processing operations like search-and-replace. This makes for a more interesting state-machine than an abacus, but still effectively a state machine in most user-apparent aspects.  

The conceptual shift from state machine to information appliance can be reduced, symbolically, to model-altering insights embodied in three perhaps seemingly incremental features. From the critical realization that the computer's representation of information could include more than the ostensible current state came the radical notion of Undo, and later its extrapolation to Undelete. From the realization that a significant body of pre-existing external human knowledge could be represented and usefully applied to user-generated information came the extraordinary new idea of machine proofreading. File transmission applied signal-bearing wires to the space between people, rather than just between devices. Combine data application and wires and you get the net as gigantic reference library and perpetual market. Combine wires and internal state and you get distributed applications and the net as communication infrastructure. Combine data application and internal state and you get data mining and machine translation. Combine all three and you get more or less everything in modern computing up to the night before IM, online dating, eBay, Mapquest, Napster, Google, SETI@Home, "people who bought this also bought", phonecams and the blogosphere.  

But it's the next morning, now, and I don't really want an information appliance. I want a virtual personal assistant. I want my writing software to think of its job not just as formatting documents, but as remembering everything I do when I write, including things I don't realize I'm doing, and things I do while writing that aren't themselves writing. I don't just want document-level Undo, I want a coherent journal view of everything I typed, including all the dead-end phrases I tried and deleted and might now want to revisit. Actually, I don't want fundamentally document-level anything, I want a dynamic evolving history of my entire interaction with my computer and the network beyond it, navigable by chronology or association. I want to jump from an email to the web page on which I found the stat I cited four replies ago in the note that started the conversation. I want to go from the song I'm playing to the birthday of the person who told me about it, to a cross-referenced list of the other music I associate with the song and the other music all my friends have mentioned in emails and IMs and forum notes and shared playlists and now-playing monitors. I want to see rhythms of correspondences and patterns of discovery and contours of neglect. I want the things I've forgotten to know when to remind me of themselves, and the things I think I know to have the humility to volunteer for their retirements.  

The primary challenges for the design of virtual personal assistants are of a different nature (naturally) than the challenges for the design of information appliances or state machines. What the state machine worries about representing, the assistant thinks about communicating and transforming and connecting. What the information appliance struggles to remember, the assistant has to decide how to share and correlate, and when if ever to forget. The state machine works to its capacity. The information appliance works to its parameters. The assistant, however, must be self-governing and evolvingly aware of its own limits, able to differentiate between automating and advising. The assistant will be evaluated not only on what it accomplishes, but on what it knows to ask and when. The state machine's applications were solipsists, however creative. The information appliance's applications were autocrats, however occasionally beneficent or enlightened. The assistant's applications are inventors and ambassadors and advocates and court jesters, and sometimes mercenaries and cannon fodder, and every once in a while oblivious innocent bystanders willing to go home without complaint when you promise them there's nothing to see here.  

And in a connected and definingly social world, the virtual personal assistant is a distributed and intimately negotiated function, and the rules that maintain the productive tension between isolation and aggregation are even more complex. What is the currency of the economy of privacy and trust? On what grounds do you delegate a privilege or retain it? What of yourself are you willing to reveal in return for what collective wisdom, from what collectives, and for that matter which and how much "wisdom" are you prepared to consume, and in what forms? When is it information we seek, and when is information-exchange merely a proxy for personal contact? When does a system become more humane by modeling its users more precisely, and when does it serve them better by leaving them to their own improvisation and compromise?  

The new world will be many things, some of which are already emerging and some of which are yet deeply hidden, but here are a few of what may be its truths:  

- Millions now stored will never be erased. In the last era, everything not saved was instantly lost. In the new era, everything not meticulously preconstructed for disintegration will be indexed and archived forever.  

- Data belongs to people, not processes. There are no silos in the new architecture. Persistence doesn't mean writing something so that it can be reconstructed by its originating code, it means writing it so that it can be reconstructed without its originating code.  

- You are in a maze of twisty passages, each explicably unique and enticingly beckoning. The new systems must not only know when to ask you questions, they must know how to categorize the properties of the possible answers. They must know how to empower your responses with nuance rather than luring you into literalist traps.  

- Everything good is relative. The old era was about identification and instantiation and encapsulation. The new era is about connection and abstraction and subcomposition and change. The old tools had files and records and pages. The new ones have links and self-description and self-direction. The old world was measured in assignments and addresses, the new one in associations and relationships. The old tools took knowledge apart, the new ones must put it back together again.  

- There are three classes of the acted-upon and the acting: objects, creatures and artists. Objects have no value except as they benefit creatures or express the work of artists, and perform no act except in response. Creatures are to be respected and defended and delighted, and acknowledged in their free will, but not burdened with responsibility or solicited for decisions. Artists are the source of all authority and the ultimate ends of all means. Humans are sometimes artists but always at least creatures. Machines and systems and programs (and policies and corporations and governments and precepts, including these) are never more than objects. The first obligation of any designed system is to be obsessively devoted to the intricate cognizance of these boundaries.  

The simplest worthy tools exist to protect or sustain something alive. The best ones express something that makes living more beautiful. What numbers do your machines safeguard that an abacus wasn't sufficient to protect? What do your machines make beautiful, that was ugly when all we had were wood and beads and hands?
There ought to be a qualitative difference between a computer and an abacus.
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.