The Deciblog just published Justin Foley's reply to my implication that he botched his analysis of first letters of heavy-metal band names. [Read those if you want the rest of

Foley cc'ed a bunch of other people in the actual email, and in an ensuing thread that got well-underway before I noticed it in my spam filter (which wouldn't have happened if I'd had the good sense to put all Southern Lord label personnel in my Address Book proactively), someone beat me to taking statistical issue with Foley's idea that my 50,000+ EM-derived sample-size was "too large", but agreed that in the abstract some sort of weighting scheme could account for the idea that Metallica earns M more points than some unkown band called The Austerity Program earns for A (or, in Foley's original analysis, T).

To all of which I said:

Weighting is easy. Let's say that a band only counts if somebody has actually bothered to write a review of one of their releases, and we'll weight them by the number of releases that have reviews. This method counts 6778 of EM's artists, who have 14057 releases between them.

Here are the percentages from the whole sample, the smaller sample unweighted, and the smaller sample weighted:

As you see, both restricting the sample and weighting do make small differences in the percentages, but S still wins, and D is still only in third.

It's also easy to

goal length: 10

searching: [] 6778 partial matches

searching: [s] 718 partial matches

searching: [sa] 122 partial matches

searching: [sac] 23 partial matches

searching: [sacr] 23 partial matches

searching: [sacri] 9 partial matches

searching: [sacrif] 5 partial matches

searching: [sacrifi] 5 partial matches

searching: [sacrific] 5 partial matches

searching: [sacrifici] 4 partial matches

searching: [sacrificia] 3 partial matches

I submit that when Daree Eeee and the mighty Sacrificia tour together, Daree Eeee will be going on first, and carrying their own mangy amps off the stage when they're done with their 3 crappy songs...

glenn

PS: I most definitely did not type in any numbers by hand.

PPS: Excel is a fine tool for lots of things. Not *these* things, though.

The aforementioned FH then clarified the less-rigorous most-metal algorithm he had in mind, which was also easy to produce:

It's more or less just as easy to do it that way, considering only the weighted likelihood of a given letter at a given position with a given preceeding character.

searching: [] 6778 candidates

searching: [s] 718 candidates

searching: [sa] 1109 candidates

searching: [sar] 870 candidates

searching: [sara] 533 candidates

searching: [saran] 450 candidates

searching: [saran ] 521 candidates

searching: [saran o] 419 candidates

searching: [saran or] 271 candidates

searching: [saran ore] 270 candidates

searching: [saran orer] 182 candidates

I think Saran Orer get a guitar tech and some sandwiches, and go on after Daree Eeee, but they're still playing for people who are there to hail Sacrificia.

I hope everything is clear now, as I'm way overdue to get back to posting pictures of my daughter...

[Discussion, if you can bear the thought, here on vF.]

*this*to make any sense, not that I'm saying you need to want that...]Foley cc'ed a bunch of other people in the actual email, and in an ensuing thread that got well-underway before I noticed it in my spam filter (which wouldn't have happened if I'd had the good sense to put all Southern Lord label personnel in my Address Book proactively), someone beat me to taking statistical issue with Foley's idea that my 50,000+ EM-derived sample-size was "too large", but agreed that in the abstract some sort of weighting scheme could account for the idea that Metallica earns M more points than some unkown band called The Austerity Program earns for A (or, in Foley's original analysis, T).

To all of which I said:

Weighting is easy. Let's say that a band only counts if somebody has actually bothered to write a review of one of their releases, and we'll weight them by the number of releases that have reviews. This method counts 6778 of EM's artists, who have 14057 releases between them.

Here are the percentages from the whole sample, the smaller sample unweighted, and the smaller sample weighted:

? | All | SU | SW |

# | 0.3 | 0.4 | 0.3 |

A | 9.1 | 9.8 | 9.8 |

B | 5.9 | 6.2 | 6.2 |

C | 6.3 | 6.4 | 6.0 |

D | 8.9 | 8.1 | 8.3 |

E | 4.9 | 4.6 | 4.3 |

F | 3.6 | 3.7 | 3.1 |

G | 3.0 | 3.5 | 3.4 |

H | 3.9 | 3.9 | 3.7 |

I | 3.7 | 3.5 | 3.7 |

J | 0.6 | 0.6 | 0.8 |

K | 2.2 | 2.3 | 2.6 |

L | 3.1 | 3.0 | 2.8 |

M | 7.4 | 6.8 | 8.2 |

N | 4.2 | 4.0 | 4.0 |

O | 2.3 | 2.4 | 2.6 |

P | 4.0 | 3.7 | 3.6 |

Q | 0.2 | 0.2 | 0.3 |

R | 3.3 | 2.9 | 3.2 |

S | 10.8 | 10.6 | 10.5 |

T | 4.5 | 4.6 | 4.4 |

U | 1.3 | 1.3 | 1.2 |

V | 2.7 | 2.8 | 2.8 |

W | 2.7 | 3.3 | 2.8 |

X | 0.3 | 0.4 | 0.4 |

Y | 0.2 | 0.3 | 0.3 |

Z | 0.7 | 0.7 | 0.5 |

As you see, both restricting the sample and weighting do make small differences in the percentages, but S still wins, and D is still only in third.

It's also easy to

*rigorously*calculate the most metal of all names, in essentially exactly the way [FH] suggests. Using only the smaller sample, we can build up the name by at each position taking the most common letter (again weighting each band name by the number of reviewed releases) among the names which match what we have so far, working towards a goal length obtained in the same weighted-average fashion. This produces this incremental search result:

goal length: 10

searching: [] 6778 partial matches

searching: [s] 718 partial matches

searching: [sa] 122 partial matches

searching: [sac] 23 partial matches

searching: [sacr] 23 partial matches

searching: [sacri] 9 partial matches

searching: [sacrif] 5 partial matches

searching: [sacrifi] 5 partial matches

searching: [sacrific] 5 partial matches

searching: [sacrifici] 4 partial matches

searching: [sacrificia] 3 partial matches

I submit that when Daree Eeee and the mighty Sacrificia tour together, Daree Eeee will be going on first, and carrying their own mangy amps off the stage when they're done with their 3 crappy songs...

glenn

PS: I most definitely did not type in any numbers by hand.

PPS: Excel is a fine tool for lots of things. Not *these* things, though.

The aforementioned FH then clarified the less-rigorous most-metal algorithm he had in mind, which was also easy to produce:

It's more or less just as easy to do it that way, considering only the weighted likelihood of a given letter at a given position with a given preceeding character.

searching: [] 6778 candidates

searching: [s] 718 candidates

searching: [sa] 1109 candidates

searching: [sar] 870 candidates

searching: [sara] 533 candidates

searching: [saran] 450 candidates

searching: [saran ] 521 candidates

searching: [saran o] 419 candidates

searching: [saran or] 271 candidates

searching: [saran ore] 270 candidates

searching: [saran orer] 182 candidates

I think Saran Orer get a guitar tech and some sandwiches, and go on after Daree Eeee, but they're still playing for people who are there to hail Sacrificia.

I hope everything is clear now, as I'm way overdue to get back to posting pictures of my daughter...

[Discussion, if you can bear the thought, here on vF.]