furia furialog · New Particles · The War Against Silence · Aedliga (songs) · photography · code · other things     ↑vF
What Beatles album is "Day Tripper" on?  

This is partially a trick question, of course, as "Day Tripper" was originally a non-album single, but it has been on several Beatles compilations over the years, including the red 1962-1966 best-of, and in the remastered 2009 catalog it lands on both the mono and stereo versions of Past Masters.  

Starting from scratch, it would take you, as a person, a little while to figure this answer out on the web. There's a Wikipedia page for the song, which is the top search hit for the above question in both Google and Bing, and it explains the non-album-ness and mentions several compilations, but it doesn't (at least as of this moment) clarify current availability, and none of the pages for the non-current compilations refer you explicitly (again, as of this moment) to the right place.  

There are a few paths that lead you to a voluminous page for the Beatles discography, on which "Day Tripper" is again mentioned several times, but this page doesn't itemize the track-listings of compilations, and the "Day Tripper" links just go back to the original song-page. But eventually you might blunder into the separate pages for the Mono and Stereo box sets, and from there you might wander over to Amazon.  

Here even more potential confusion awaits you. The top Google hit for "amazon past masters", at the moment, is the now-obsolete second volume of the old CD edition. Searching on "past masters" on Amazon itself gets you the Remastered edition as the top hit, but idiotically suggests buying it together with the old Volume 2, and you would have to scrutinize the track lists to realize that Past Masters [Remastered] subsumes Past Masters, Vol. 1 and Past Masters, Vol. 2.  

In fact, if you backtrack and try searching for "Day Tripper" on Amazon, the top hit is Rubber Soul, the album on which "Day Tripper" would have appeared, chronologically, but doesn't. And, for good measure, that top hit is the obsolete 1990 CD, not the new remaster.  

Bleh.  
 

But at least you're a person, and via a judicious combination of intuition and stubbornness and asking some friends who might know, you can eventually solve these information problems. If you were a computer, you'd be fucked.  

Which is OK on some existential level, because if you were a computer you probably wouldn't appreciate the song, anyway. But the point of computers is to help people do things, and a computer ought to be a particularly helpful tool when the thing you need to do is sort through some data.  

But for the computer to help you puzzle through this data, the data has to be modeled usefully by people first. There are several prominent sources of meticulously structured data about music, so this should be easy. But here, sadly, people have let us down again. And again, and again. Let's see how.  
 

All Music Guide  

A text-search for "Day Tripper" (there's no other query interface) returns a full page of cryptic results. There's an "Occurrences" column, and although it's not clear exactly what that means, it's obvious that more is supposed to be better, and the first listing has 360 where none of the rest have more than 13, so presumably that's the "right" one.  

Clicking this gets you 8 pages of results, which is annoying in itself (the splitting of them into pages, I mean). They're sorted by Artist, which sounds reasonable enough, except that the ones without artists are sorted first, and thus the first page of results is almost totally crap. There are lots of Beatles releases listed, but they get split between pages 1 and 2 of the results list, making it impossible to look at them all at once.  

But if you drill into one of them at random, and then click on "Day Tripper" in the track listing, you do finally get to a page that lists all (or several, anyway) Beatles releases on which this song appears. There are 24, though, including such things as "Five Nights in a Judo Arena", which human intuition might guess is not a normative release, but a computer would have no basis for dismissing. These releases are in date-order, at least, but this turns out to be worse than worthless for our current question, because All Music has modeled Past Masters [Remastered] not as an album, but as an alternate manifestation of the album Past Masters, Vol. 1 & 2, which means it appears way up in the middle of this list, labeled 1988, because that was the year of its earliest issue (on cassette!).  

Looking through the data, in fact, we see that although All Music has lots of individual detail on most kinds of things, it has essentially nothing that models the relationships of things to each other, or in groups. There is no modeled connection between Past Masters, Vol. 1 & 2, Past Masters, Vol. 2, The Beatles: Mono Box Set and The Beatles: Stereo Box Set, even though v1/2 subsumes v2, and both boxes subsume both.  

And there's no modeling of "in print", or any notion of representing the subset of albums that represent the current core catalog. So a computer can't use this data to answer real questions by itself. Source fail.  
 

MusicBrainz  

This is a database first, not a guide, and thus a more likely candidate for well-structured data anyway, and one where I won't pick at their explicitly-secondary browsing UI.  

The good news is that MusicBrainz has the kind of data we need. They have some relationships between tracks, like one being a mashup of some others, so presumably they could add one to express that the Mono Masters of "Day Tripper" is a different version from the Past Masters [Remastered] one, but the same underlying song. They already have a reconciliation mechanism by which they can say that the "Day Tripper" on 1962-1966 is "the same" as the one on Past Masters, although at the moment the reconciliation data looks too noisy for real use.  

They even have the notion of one release being part of a set, although I didn't find very many examples of sets, and in particular I can't tell if a release can be part of more than one set. But if they can, that might be a mechanism for expressing official catalogs, current availability, and various other kinds of groupings and subsets.  

So current source fail, but at least there's hope here.  
 

Freebase  

Freebase is easily the most sophisticated public attempt at universal data-modeling, at the moment, but this is a caveat as well as a compliment. Freebase models attempt to represent everything that could possibly exist, and thus tend to drift quickly from usable simplicity towards abstractly-correct awkwardness, usually coming to rest far into the latter.  

So if you search on "Day Tripper", you will find that there are results of that name as "Topic", "Composition", "Musical Album" and "Musical Track", with at least dozens of the latter. Freebase fails the usability test even more spectacularly than All Music, as the list of Musical Tracks is presented with no grouping or distinguishing information at all, just "Day Tripper (Musical Track)" after "Day Tripper (Musical Track)", and you have to click on each one to get any clarifying info. "Day Tripper" the composition does not link to any of the tracks, and "Day Tripper" the album turns out to be a compilation of Beatles covers which does not, at least as far as the listed information shows, even contain the song "Day Tripper".  

And if you delve into the internals of the Freebase music schema, you can quickly develop a guess about why the data has not all been filled in: there's too damn much structure. A music/track is a recording of a music/composition. The track can appear on multiple music/releases, each of which is a publication of a particular music/album. Unless you need to model who was in the band during the making of an album, in which case the album links instead to a set of music/recording_contributions, which is each a combination of albums, contributors and roles.  

Oh, except compositions can be recorded "as albums", in which case they link to music/album without going through music/track, and tracks can appear directly on releases without going through music/album. And there's no current property for saying that a given track is an alternate version of another, but from following Freebase modeling discussions I can confidently guess that they'd model that by saying that a music/track links to something like a music/track_derivation, which itself is a combination of original track, derivative track, deriving artist (or music/track_derivation_contribution) and derivation type. And Freebase's query-language doesn't provide any recursion, so if these relationships chain, good luck querying them.  
 

Music Ontology  

This isn't a database, just an attempt at a model for one. And, grimly, it's another quantum level more elaborately and unusably correct than the Freebase model. Even "MO Basics" (and the "Overview" has 22 more tables of explication beyond these "Basics", without getting into the "details") includes conceptual distinctions between Composition, Arrangement, Recording, Musical Work, Musical Item, MusicalExpression and MusicalManifestation. And then there are pages upon pages of minutely itemized trivia like beginsAtDuration, djmix_of, paid_download (and "paiddownload", which is different in some way I couldn't figure out), AnalogSignal, isFactorOf... This list is bad because it's too long, but the fact that it's in the schema means that it's also bad because no matter how long it is, it will never include every nuance you ever find yourself wanting, and thus over time it will only accumulate more debris.  

A tour-de-force into a cul-de-sac.  
 

The Rest of the Web  

Searching on any particular band or bit of music will unearth dozens or hundreds of other sites that contain bits of the information we need: stores, discographies, databases, forums, fan pages, official sites. Almost universally, these are either unqueriable flat HTML pages, or tree-structured databases with even less interlinking than the above sites. Encyclopaedia Metallum, my favorite metal site, has full track listings for a genuinely mind-boggling number of releases by an astonishing number of bands, but the tracks themselves are not data-objects and a machine can find out nothing about them. There are several lovingly hand-crafted Beatles discographies on the web, all far too detailed for our original casual query, and all essentially useless to a computer attempting to help us.  

So: Ugh. Triple ugh because a) the population of people willing to put time and energy into filling out music-related data-forms is obviously huge, b) the modeling problems are not intractably complicated in any theoretical sense, c) MusicBrainz and Freebase, at least (and the system I'm designing at work, I think), seem to be technically sufficient to represented the data correctly. If only we had a better plan.  
 

DiscO  

So here's my attempt at a better plan. I call it DiscO, for Discographic Ontology; that is, it's a scheme for structuring discographies. It is not an attempt at an abstract physics of human air-vibrating creativity, it is just an outline of a way to write down the information about bands, the music they've made, and how that music was released. It's intended to be simple enough that you can imagine people actually filling in the data, but expressive enough that the data can support interesting queries. And it's specifically intended to model nuance abstractly, so that it can accommodate some kinds of new needs without perpetually having to itself expand.  
 

There are four basic types:  

Artist - The Beatles, Big Country, Megadeth, Frank Zappa, whatever...  

Release - an individual album, single, compilation, whatever; Rubber Soul, Past Masters [Remastered], "Day Tripper"/"We Can Work It Out"...  

Track - an individual version of a song; "Day Tripper", "Day Tripper [mono]", "Day Tripper (performed live on pan flute and triangle by Zamfir and Elmo)", etc.  

Sequence - any collection of releases; Original Albums, Japanese Cassette Singles, 2009 Remasters, etc.  
 

These are related to each other like this:  

Artists mostly have Sequences. Sequences can be anything, but many artists would have some standard ones: Original Albums, Singles, Compilations, Remastered Albums, Current Catalog.  

Sequences have Releases (and Artists).  

Releases have Dates, Labels and Tracks. A Release may have an Artist directly, but more often would have one indirectly via a Sequence.  

Releases may be related to each other via Alternate Version/Original Version links. Thus Past Masters, Vol. 1 & 2 and Past Masters [Remastered] are both Releases, but Past Masters [Remastered] has an Original Version link to Past Masters, Vol. 1 & 2, and Past Masters, Vol. 1 & 2 has an Alternate Version link to Past Masters [Remastered].  

Tracks have Durations. A Track may have an Artist directly (so individual tracks on multi-artist compilations can be attributed correctly), but more often would have one indirectly via Release (which itself may have one indirectly via Sequence).  

Tracks may also be related to each other via Alternate Version/Original Version links. "Day Tripper" and "Day Tripper [mono]" are both Tracks, but "Day Tripper" has an Alternate Version link to "Day Tripper [mono]", and "Day Tripper [mono]" has an Original Version link to "Day Tripper". (We can get into geek arguments about which versions are the same and which are derivations (of which!), if we want, but whatever we decide, we can model.)  

Restated in schema-ish form, that's:  

Artist
- Sequence
- Release
- Track  

Sequence
- Artist
- Release  

Release
- Sequence
- Artist
- Date
- Label
- Track
- Original Version
- Alternate Version  

Track
- Artist
- Duration
- Original Version
- Alternate Version  

I think that's basically enough. What it gives up in expressiveness, it gains in usability. Our Beatles data can now, I think, be modeled both tractably and informatively. We can hook up all the versions of albums and versions of songs. We can create whatever sequences we need, and since the sequences themselves are just data, it's fine to have "Canadian Singles" for the Beatles and "Fanzine Flexis" for The Bedsitters without implying that either band should also have the other.  

And using Thread, the query-language I will (before long, hopefully) be attempting to spread through the universe, we can start to ask our questions in a way the computer can answer:  

Track:=Day Tripper.Release
 

This is our naive query. It gets all the releases that have any track called exactly "Day Tripper". Good for assuring us there's some data in the bucket, but not much help in answering our question.  

Track:=Day Tripper.Release:(.Artist:=The Beatles)
 

That limits our results to albums by the Beatles, but there are still too many. With our fully-interlinked data-model, though, we can now actually ask something that is much closer to what we mean:  

Artist:=The Beatles.Sequence:=Current Catalog.Release:(.Track:=Day Tripper)
 

That is, find the artist The Beatles, get their Current Catalog sequence, get that sequence's releases, and filter those releases down to the ones that contain a track called exactly "Day Tripper". This is progress.  

But "called exactly 'Day Tripper'" will exclude "Day Tripper [mono]", which isn't what we want to do. We're trying to ask a musical question about a song, not a typographical question about a title. But this, too, we have the powers to cope with:  

Track|Day Trippers=(...__,Original Version:=Day Tripper)  

Artist:=The Beatles.Sequence:=Current Catalog.Release:(.Track.Day Trippers)|
DT Versions=(.Track:Day Trippers=?),
Other Tracks=(.Track:Day Trippers=_._Count)
 

This time we first define a new inferred relationship on Track called "Day Trippers", which gets the Track, all its Original Versions, all their Original Versions (recursively), and then filters this set of tracks down to just the ones called "Day Tripper".  

Then we get the Beatles' current catalog releases again, but this time instead of checking each release for a track named "Day Tripper", we use our Day Trippers relationship to check for a track that is, or is derived from, "Day Tripper". And then, for each of the releases that have one, we infer two new relationships: "DT Versions" tells us which track(s) on this release are versions of "Day Tripper", and "Other Tracks" counts the tracks on this release that are not derivations of "Day Tripper".  

I.e.:  

# Release DT Versions Other Tracks
1Past Masters [Remastered] Day Tripper 32
2Mono Box Set Day Tripper [mono] 212
3Stereo Box Set Day Tripper 238
 

So now we know our choices. It took us so long to find out, but we found out.  
 
 
 

Tantalizing Postscript: But now that we have these three options before us, how do their contents overlap or differ, track by track?! We could bring up three different windows and squint at them. Or we could ask the computer:  

Artist:=The Beatles.Sequence:=Current Catalog.Release:(.Track.Day Trippers)
/(.Track...__,Original Version:##1)/=nodes
 

Aaah. I see now. (How nice it will be when I'm allowed to show you...)
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.