furia furialog · Every Noise at Once · New Particles · The War Against Silence · Aedliga (songs) · photography · other things
Two of the very bad features of the SQL-based conception of relational data-modeling, I think, are these:  

- one-to-many relationships are significantly harder to handle than one-to-one relationships (and, inversely, many-to-one relationships easiest to handle by defactoring)  

- absence is significantly harder to handle than presence  

It's easy to add an Artist column to an Album table, for example, so we get:  

AlbumArtist
Black OneSunn O)))

But if you want to model the fact that collaboration albums can have multiple artists, you need an Albums table, an Artists table, a join table, and a bunch of IDs:  

Albums
AlbumID
Black One1
Altar2

Artists
ArtistID
Sunn O)))1
Boris2

Authorship
AlbumIDArtistID
11
21
22

This is already a pain in the ass, and thus you get a world filled with cop-outs like this:  

AlbumArtist
Black OneSunn O)))
AltarSunn O))) / Boris

and this:  

AlbumArtist 1Artist 2Artist 3
Black OneSunn O)))FALSEFALSE
AltarSunn O)))BorisFALSE

both of which are tolerable enough to just look at, but awful when you go to try to get the computer to answer even perfectly sensible questions like what albums Boris has done.  

And then, when you want to add a little more detail, you really ought to do this:  

Albums
AlbumID
Black One1
Altar2

Artists
ArtistID
Sunn O)))1
Boris2

Formats
FormatID
CD1
LP2
Download3
Authorship
AlbumIDArtistID
11
21
22
Formatship
AlbumIDFormatID
11
12
13
21
22

but instead you'll probably try to get away with:  

AlbumArtistFormatCode
Black OneSunn O)))CD/LP/D
AltarSunn O))) & BorisCD/LP

or  

AlbumArtist 1Artist 2Artist 3CDLPD
Black OneSunn O)))FALSEFALSETRUETRUETRUE
AltarSunn O)))BorisFALSETRUETRUEFALSE

because if you have the latter, you can do:  

SELECT Album, Artist

FROM Albums
WHERE D=FALSE

which is easy to understand, although it won't work because you forgot you don't have a field called plain "Artist" anymore. Whereas with the five-table form you have to do something like:  

SELECT Albums.Album, Artists.Artist

FROM Albums, Artists, Authorship
WHERE Albums.ID=Authorship.AlbumID
AND Artists.ID=Authorship.ArtistID
AND NOT EXISTS (
SELECT *
FROM Formats, Formatship
WHERE Albums.ID=Formatship.AlbumID
AND Formats.ID=Formatship.FormatID
AND Formats.Format='Download'
)

and that's probably still wrong.  
 

I submit that this is all a colossal mess, and it's a wonder the database-driven software we get driven out of it isn't even more woeful than it is. A better paradigm would embrace the opposites of these two flaws:  

- the modeling of a relationship should be the same whether the relationship is one-to-one, one-to-many, many-to-one or many-to-many, and should be the same no matter how many "many"s you have  

- absence and presence should be equally easy to assess  

And I think a useful rough metric for such a new system is that it allows you to model your data without ever needing to say FALSE.  

And I think this, I think, because I think this is how we think and talk about relationships between things when nobody is getting in our way:  

Black One was recorded by Sunn O))), and was released in CD, LP and Download formats. Altar was recorded by Sunn O))) and Boris, and was released in CD and LP formats. All of these albums were released on CD and LP. Altar is the only one that wasn't released in Download format.  

Take these out of sentence form and make the data-types explicit and everything is still perfectly straightforward:  

Album: [Black One]

Artists: [Sunn O)))]
Formats: [CD] [LP] [Download]  

Album: [Altar]
Artists: [Sunn O)))] [Boris]
Formats: [CD] [LP]

or, looking at the same data from a different perspective:  

Format: [CD]

Albums: [Black One] [Altar]  

Format: [LP]
Albums: [Black One] [Altar]  

Format: [Download]
Albums: [Black One]

and then the old questions are also easier:  

FIND Formats WHOSE Albums INCLUDE 'Black One' and 'Altar'

FIND Albums WHOSE Formats DON'T INCLUDE 'Download'

Subliminating the one/many distinction allows us to talk about albums with 3 or 8 or 0 artists just as readily as we talk about albums with 1 or 2. Fixing the query language to handle absence lets us cope with the introduction of a new edition into our dataset, say, without having to add a new field for it and/or go through asserting that everything we already knew about doesn't have that format. I suspect pretty much every database field headed X and filled with TRUEs and FALSEs could at least be more usefully modeled in my new world as a relationship called "Characteristics" that either does or doesn't include X. And usually, as with formats, there's actually a meaningful label that contains information itself. The point is that things have relationships to their characteristics, or their formats. They don't, except in a very existential sense, have a relationship to falseness. FALSE is a machine thing.  
 

The relationally astute will note that the five-table version is the fully-normalized representation that would support an SQL-based implementation of my new form as a display abstraction, and presumably my pseudo-query-code could be preprocessed into the messy SQL so I'd never see it. But that's my point: SQL is too low an abstraction, which makes it too hard to do things naturally, which makes it too hard to pass along natural behavior to the victims of the system. I'm mainly arguing for a better abstraction. (Although I also expect it's still always going to be bad that there's no better underlying way to represent lists, so I'm also probably arguing for the better abstraction to not merely be layered on top of the bad one.)  
 

The discographically astute will note that Altar is available on iTunes, and that I have vastly undermodeled the rich and complicated space of Sunn O))) release formats. But if I'd done the examples in real detail, I'd still be typing out the bad versions...
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.