furia furialog · New Particles · The War Against Silence · Aedliga (songs) · photography · code · other things
18 March 2010 to 4 November 2009
I have a post on my work-blog about why geeky-sounding data-modeling issues matter to even simple-looking data.  

The post uses some Oscar-award data as an example, as I just put together a Needle version of Oscar History, so if you ever wanted to be able to answer some obscure statistical question about the Oscars, now you can. (Or you can ask me, and I can...)
From Revisiting HTTP based Linked Data:
What is Linked Data?  

"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.

If there's anybody in the universe (who isn't already a Semantic Web expert) who reads that and thinks "Yes! I gotta get one of those!", I'd be very surprised.  

For that matter, if there's anybody in the data business, other than the writer, who reads that and thinks "Yes! That's the Grand Quest to which I have given my Life and Loyalty!", I'd be almost as surprised, and rather suspicious.  

Contrast this:
What is Whole Data?  

Whole Data is a method of storing and representing your data so that you can explore it as easily as you browse the web, and examine or analyze it from any perspective at any time. It gives you and the computer a common language for talking about your data, so that the computer can answer your questions by examining your data the way you would, but way faster.

I'm not saying this is the same level as "We hold these truths to be self-evident...", or "I'm not going to pay a lot for this muffler", either. It's a technical appeal, not a Radio Free Earth broadcast. But it's closer to comprehensible, at least, isn't it?
If you'd rather see a glimpse of Needle, my work project, involving sports instead of music, I just did a quick chomp through the medal-winner data from the Vancouver Olympics. Tables, breakdowns, cross-tabulations, etc.!  

Needle - Vancouver 2010 Olympics
There's a startup called Etacts whose software can import your email (headers), and give you a view by person instead of by message. This is pretty obviously something you'd like to be able to see, if you have any significant volume of correspondence or correspondents. And it's pretty obviously a large leap of faith to give some random startup the ability to log into your email account.  

But, even more, this is a perfect example of something you ought to be able to do, at least basically, with your data without needing any extra application. Etacts provides some unique contact-management features, but the fundamental function of rotating data to some other axis is, I contend, rightfully a property of the data. You already own the view of your email by recipient, because it is immanent in your email. But your email service probably doesn't let you see it. I would like it to be the responsibility of a data-holding service to give you full logical access to that data.  

This is a huge part of why my work project matters to me. It is my attempt to design a universal data-modeling paradigm, and associated query-language, that could be a candidate standard for what it means to provide "full logical access" to data.
Usage Notes  

"In lieu of" means "instead of", with the general implication that the object of the phrase is absent or unavailable. So "I ordered onion rings in lieu of fries" if they were out of fries and you were forced to choose something else, but "I ordered onion rings instead of fries" is better if it's just your choice. "In place of" works for both, too.  

"In light of" means "as a result of", with the sense of having changed something because you observed (thus the "light", by which you see) something else. So "In light of the overwhelming shift in demand from fries to onion rings, I recommend we reduce our potato order." Or just "Let's get more onions instead of so many potatoes. Everybody hates potatoes now, apparently."  

"In favor of" means "in place of", like "instead of" but in the opposite order, with a sense of some kind of trade-off. "We reduced our potato order in favor of more onions to satisfy rampant onion-ring demand." Or "We bought more onions instead of potatoes, because everybody is ordering onion rings for some weird reason now. Did somebody on Chowhound post something good about our onion rings? Or bad about our fries?!"  

In light of frequent misuse, in lieu of specific needs I recommend eschewing all three in favor of saying what you know you mean.
Now that Needle, my work project, is finally no longer secret, I'm starting my slow seditious campaign to subvert the entire Semantic Web establishment. "Entire" is maybe a big word for a small world that not very many people care about, but part of the reason I care is that I think more people would care about the problems this community is addressing if the nature of the problems weren't so obscured by the prevailing ostensible solutions.  

I'll be taking this argument on the road, albeit only a couple blocks down it, for a short talk at the Cambridge Semantic Web Meetup Group, next Tuesday night (9 Feb, 6pm) at MIT.
DERI semantic-web researcher Alexandre Passant just announced a semantic-web-based music-recommendation engine called dbrec. It runs on dbpedia data, and computes "Linked Data Semantic Distance" between bands to find likely suggestions. This is an intriguing premise, and probably a worthy experiment.  

The site isn't labeled "intriguing experiment", though, it's labeled "intelligent music recommendations". Here are its top "intelligent" recommendations for, just to pick a random example near the beginning of the alphabet, Annihilator:  

Jeff Waters
Primal Fear
Randy Black
Bif Naked  

"Intelligent" is not really the word for this.  

On the one hand, the quality of the recommendations is not mostly Passant's fault. The underlying data isn't that great, and you can see how not-great it is in Passant's generated explanations, like this one for how Jeff Waters and Annihilator relate:  

Annihilator (band) is 'associated musical artist' of Jeff Waters (7 artists sharing it)
Annihilator (band) is 'associated acts' of Jeff Waters (7 artists sharing it)
Annihilator (band) is 'associated band' of Jeff Waters (7 artists sharing it)
Jeff Waters is 'current members' of Annihilator (band) (2 artists sharing it)
Annihilator (band) and Jeff Waters share the same value for 'genre'
- Thrash metal (529 artists sharing it)
- Groove metal (101 artists sharing it)
- Speed metal (170 artists sharing it)
- Heavy Metal Music (1534 artists sharing it)
Annihilator (band) and Jeff Waters share the same value for 'reference' (1 artists sharing it)

This is an incredibly obtuse way of saying, as the human-readable Wikipedia article about Waters puts it in the first sentence: "Jeff Waters is the guitarist and mastermind of the thrash metal band Annihilator". Passant's data doesn't quite record this fact, so he's left to try to make sense of the difference between "associated musical artist", "associated act" and "associated band" and "reference".  

Unsurprisingly, not much interesting sense results. Everything in this example is connected primarily through personnel overlap. Drummer Randy Black has played in both Primal Fear and Annihilator, and on one Bif Naked album. D.O.A. founder Randall Archibald sang on two Annihilator albums. Extreme are on the list because drummer Mike Mangini played briefly in both bands. Bif and Annihilator share the hometown "Canada".  

These connections aren't irrelevant, exactly. If you were trying to get the phone number of Annihilator's booking agent, they might be worth scanning through in case you spot somebody you went to high school with.  

As musical recommendations, though, they suck. As "intelligent" musical recommendations they're idiotic. Annihilator is a thrash-metal band, Extreme were metal-derived MOR pop, Bif Naked is a punk singer. Compare the list, with no claim of "intelligence", for Annihilator on empath:  

0.228 Anthrax
0.224 Dio
0.206 Sodom
0.182 Saxon
0.176 Black Label Society
0.175 Onslaught (Gbr)
0.170 Grave Digger

A person could probably do better than this, too, especially if they're allowed extra adjectives and a lower granularity than artists ("like Kill 'Em All-era Metallica", or "started off like early Slayer, but with more emphasis on technique"; and I don't even know Annihilator very well), but this list is at least not inane.  

And yet, Passant's work is almost certainly more technically sophisticated than mine. I used one genre, one data-source and one connection-metric, and produced a deliberately simple web-site with almost no ancillary information. Passant had to confront the sprawl of the Linked Open Data "cloud", figure out non-obvious weightings for a bunch of different connection paths, and display a lot more information than I deal with, in both breadth and depth.  

And yet, and yet, and yet: The recommendations are bad. Or, more accurately, the connections are what they are, but calling them recommendations is bad. Calling them "intelligent" is worse, and presenting the combination of "intelligent", "recommendation" and "linked data" to the general public is deadly. If "Linked Data" and "Semantic Web" mean ways for machines to tell me that if I like Annihilator I should listen to Extreme, then nobody needs them. If Linked Data, the movement, can't tell the difference between intelligent and idiotic, it's not to be trusted on anything.  

[2013 Postscript]

The short version:  


1. Madder Mortem: Eight Ways
2. Secrets of the Moon: Privilegivm
3. Thy Catafalque: Róka Hasa Rádió
4. Antigua y Barbuda: Try Future
5. Funeral Mist: Maranatha
6. Absu: Absu
7. Wardruna: Runaljod - Gap Var Ginnunga
8. Lifelover: Dekadens
9. Amorphis: Skyforger
10. Cantata Sangui: On Rituals and Correspondence in Constructed Realities  

Not Metal  

1. Manic Street Preachers: Journal for Plague Lovers demos + album + remixes
2. Tori Amos: Midwinter Graces and Abnormally Attracted to Sin
3. Idlewild: Post Electric Blues
4. It Bites: The Tall Ships
5. Bat for Lashes: Two Suns
6. Wheat: White Ink Black Ink
7. Stars of Track and Field: A Time for Lions
8. Tegan & Sara: Sainthood
9. Metric: Fantasies
10. Maxïmo Park: Quicken the Heart  

The long version:  

TWAS 511: Lay Down (With) Your Armor  

The listen-for-yourself versions:  

- furia2009metal.zip (21 songs, 2 hours, 110MB)
- furia2009nonmetal.zip (31 songs, 2 hours, 111MB)
After deconstructing the annual Village Voice music poll for many years, this year I actually helped construct it, doing the data-correction and tabulation myself in Needle, the new database system I work on at ITA Software. In conjunction with this, the system itself is finally making its public debut! We are still in private beta for people who want to build data-sets using our tools, but the Voice has allowed us to share the underlying data from the poll as a sample of what data-sets look like in Needle.  

I will have, probably, far too much to say about this over time, now that I can actually show people a partial glimpse of what I've been doing for a living for 3+ years. For now, here are the links:  

Pazz & Jop 2009 (official results)
Needle - Pazz & Jop 2009 (system intro and data explorer)
All-Idols 2009 (centricity, similarity, kvltosis and other assorted stats)
Fear is one thing, but voting for hate when love would have cost you nothing is pretty much the quintessence of cowardice. Love will win, and after that, telling the story of standing up for divisiveness and intolerance will be like being proud of having hand-lettered "No Coloreds" signs for water-fountains.  

And although the individual people will be forgiven, the institutions should be forfeit. In particular, any church that campaigned against same-sex marriage has violated the social contract that permits religion and rational society to coexist, and should be seized by eminent domain and fumigated for ignorance.
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.