furia furialog · New Particles · The War Against Silence · Aedliga (songs) · photography · code · other things     ↑vF
If you're going to waste your time doing obsessive analysis of data on which nobody's life or ecology depends, you ought to at least do it diligently and efficiently.  

About a month ago The Deciblog published Justin Foley's attempt to answer the timeless question "How likely is a metal band to start their name with a particular letter of the alphabet?". For his sample set, Foley took the combined rosters of several metal labels (he doesn't reveal either list), which gave him 814 names, for which he then calculated the first-letter distributions, reaching the startling conclusion that the most likely letter is S.  

Foley put his results in a bar-chart, which I assume means he used a spreadsheet, so hopefully he didn't spend a whole lot of time hand-counting. But he should have spent even less. The Encyclopaedia Metallum is not only just sitting there with a collaboratively-amassed and collectively-moderated database of 50,000+ metal bands, but they've even already split it up by first-letter and there are band-counts right at the top of each letter-page. Add, divide, and you're done.  

Here, then, is the much better-informed version of this still-pointless breakdown:

? % *
# 0.3%
A 9.1% *********
B 5.9% *****
C 6.3% ******
D 8.9% ********
E 4.9% ****
F 3.6% ***
G 3.0% ***
H 3.9% ***
I 3.7% ***
J 0.6%
K 2.2% **
L 3.1% ***
M 7.4% *******
N 4.2% ****
O 2.3% **
P 4.0% ***
Q 0.2%
R 3.3% ***
S 10.8% **********
T 4.5% ****
U 1.3% *
V 2.7% **
W 2.7% **
X 0.3%
Y 0.2%
Z 0.7%


Most of Foley's numbers aren't that far off. His small sample-size leads him to overestimate J and Y, and underestimate Q and V, but the absolute numbers for these letters are small anyway. He also seems to underestimate A, P and R, for reasons which are not apparent in his opaque reporting, but might have to do with language tendences, as EM's list is probably more global than his.  

But the biggest discrepancy in Foley's numbers, by far, is T, which he credits with 9.3% of the band-names, where EM data indicates less than half that. Here I have a wearyingly mundane but highly plausible theory: Foley has accidentally counted all the bands whose names begin with "The " as T, despite specifically saying that he didn't. This is obviously both philosophically and methodologically repugnant, and although I regret the maelstrom of blogospheric outrage that will undoubtably accompany my public exposure of this error, I think we owe ourselves (the) Truth.  



Of course, the thousands of metal fans around the world who have put time and effort into building the Encyclopaedia Metallum did it because they care about the music, not the alphabet. The site is the definitive central reference source for most metal-related matters, and certainly the final arbiter of obscurity for the vast unknown majority of the bands it lists.  

It also has, in addition to its factual content, tens of thousands of percentage-scored, user-attributed, peer-moderated reviews of metal recordings. What it does not have is any sort of similarity analysis to make use of the huge data-graph represented by the connections between bands, users and ratings. There is a wearyingly mundane reason for this, too: trying to do similarity analysis with SQL queries will make you want to eat your own neck.  

So here is an actual contribution to the world's knowledge on this admittedly peripheral subject: empath, the missing similarity analysis of EM user/review/band data, accurate as of yesterday. Pick a band, see the other bands that people who like the first band also like. Data wants to form shapes. In a better world, this would be just as easy for EM to do themselves, updating live, as it is for them to serve their raw data into web pages.  

Easier, actually. I bet it took me less time to do this analysis than it took Foley to make a bar-chart of first letters. But I have better tools. I have better tools because at the moment I'm paid to design better tools. If I do my job well enough, eventually you'll have my better tools, too. I'm not designing them to tabulate heavy metal, I'm designing them to answer questions. Not all answers turn out to be shaped like Truths, of course. But if you can't answer them, you can't be sure which are which.
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.