furialog

15 January 2013 to 26 August 2011

¶ 2012 Pazz & Jop · 15 January 2013

The 2012 Village Voice Pazz & Jop music-critics poll is now out. As has been the case for the last few years, I did the data-correction and tabulation for it. The results of this work are here:

· The Voice's P&J home
· My piece about statistics and analyses
· My massive stastistical hyperindex to the last 5 years of the poll
· My own ballot

The short version: Frank Ocean, Carly Rae Jepsen. If the former of these surprises you, you probably aren't paying attention to music criticism. If the latter, you probably aren't paying attention to pop. Neither of these are life flaws, mind you. But now you have a chance to catch up.

PS: This poll used to be tabulated by people. This is not what people are good at. Eventually computers were employed to help count, but cleaning up the data so it could be counted by computers still took several person-weeks of unpleasant human effort. Then some coworkers and I spent in the neighborhood of 25 person-years building a data-correction and -analysis system with which the whole thing could be done in about 3 person-days. Then Google bought the company for which we made that thing, and shut it down. So I spent about 4 person-days writing a new correction/analysis system from scratch myself, with which this year's poll took about 4 hours to correct and tabulate. Including fixing a few stray errors that the previous system missed.

There is a moral in there somewhere.

¶ The Rules That Govern Hearts · 2 December 2012 listen

I work at a music company. Our holiday party involves people who work there performing music. I occasionally make music, but I almost never actually perform it. Very, very close to never.

But I wanted to this year. So I wrote a new song to sing. It's called "The Rules That Govern Hearts". I have a couple weeks to practice singing it, but you can start practicing listening to it right now.

The Rules That Govern Hearts (3:37)

¶ What Robots Listen to While They Talk About Love · 25 September 2012 listen/tech

Somewhere around 1992 or so, I had to write my "personal goals" at work for the first time. I put down some stuff about design and usability, which I have long since forgotten, but I distinctly remember that the last item was "Exploit the power of information technology to improve my music collection."

My job, at the time, had nothing to do with music. It did provide me with an internet connection for downloading Gary Numan discographies from Usenet, and it did once send me to a UI conference in Amsterdam, from which I came back with approximately 130 pounds of European CDs. But the connection from what I was doing to what I was listening to remained a little abstract.

But that job got me the one after that, which got me the one before this, which got me to now. The scope of "music collection" has expanded quite a bit over this time, obviously, but in fact I am very literally now paid to exploit the power of information technology to improve the world's experience of music.

The vast majority of things I work on, to this end, involve identification or disambiguation or extrapolation or interpolation. They are corrective or exploratory or contextual. They try to carry you from somewhere to somewhere else, or to rescue you from something squelchy along the way. They try to get computers to understand enough about human cues and contexts and constraints to fill humanless spaces with rough facsimiles of what humans would have suggested if they'd had the time.

But I just did one that isn't so much like this. One of the computed song metrics I've been working on is Discovery, which attempts to quantify the idea of songs emerging. This is a detection metric, not an aesthetic one. It isn't trying to tell you songs you should listen to, it's trying to find the songs that people are in fact starting to listen to more than the established prominence of the artists can explain. Discovery songs tend to be new, but a fair number of songs get their unexpected breaks after they've already been out for a year or two. And once enough people discover a Discovery, obviously, it stops qualifying.

There are a lot of fairly arbitrary thresholds and weights involved in this calculation. How much prominence constitutes critical mass, and how much constitutes overexposure? How old can something be and still be sorta new? And the notion is inherently subjective, of course: anything you've heard a dozen times yourself is no longer a discovery to you, whereas a song 1.3 billion people have been dancing to across Asia every morning for the last 6 months still could be. And there is no way to stipulate correct answers against which this score can be quantitatively tested.

But provable correctness is overrated, or at least sometimes irrelevant. Music discovery is working if you can use it to discover music. And now you can test this particular method yourself. Starting today I'm going to be maintaining an official Echo Nest playlist on Spotify with the top 100 songs of the moment according to this discovery score. It's here:

The Echo Nest Discovery

If you subscribe to it, each week's new songs will get flagged for you automatically.

The songs the robots find cheerfully comply with no particular style or pattern. The set is not coherent in any human sense. It's in rank order, but the ranking logic will not be evident or audible. There will be interminable dubstep remixes next to country laments next to Christian hardcore next to Azorean folk traditionalism. You should expect to hate half of these, and find half of the rest inscrutable. This is definitively impersonal. Somebody, somewhere, liked each of these songs, but we've stripped them of any idea of whom or why.

And yet, the fact that you don't know those people doesn't mean you won't agree with them. I endorse this discovery experiment on the purely empirical grounds that I have personally discovered things this way myself. So, for the record, I'm also going to maintain a personal playlist of what I liked from the robot one:

Treasures the Robots Brought Me

My suspicion, which you're welcome to evaluate for yourself, is that my human picks will be essentially no less random for you than the overall robot list. But if you find something, either way, we both win.

[I'm not sure how the robots win. Probably they win no matter what. If I have a non-musical goal to add at this job, it's probably to keep trying to make sure the human/robot thing is never zero-sum...]

¶ The Entrepeneur's Dilemma · 2 August 2012

Just as night begins to fall, you come across some people. They're cold, and they're scared, but they've made a pile of wood and they're trying to start a fire.

The Salesperson's Dilemma is: How much can you charge them for matches?

The Businessperson's Dilemma is: If you give them some matches for free tonight, can you sell them new axes in the morning?

Then they get the fire started.

And now the Entrepreneur's Dilemma is: Can you persuade them to try the bizarre experiment of rearranging the pile of wood into a big hollow box, called a "house", faster than they can burn the rest of the wood?

¶ The Echo Nest Is Always Listening · 20 July 2012 tech

I have a post up on the company blog at work today. It's about music or math, depending on your perspective.

¶ Needless · 30 May 2012 essay/tech

We will look back on these days, I think, as some weird interlude after the invention of computers but before we actually grasped what they meant for us. The Age we are stumbling towards, I am very sure, is the Age of Data. And when we get there, we will be there because we have sublimated the state-machine mechanics of computers beneath the logical structural abstractions of information and relation, and begun to inhabit this new higher world without reference to its substrate.

I spent 5 years of my life trying to help bring this future about. That is, in a sense I've spent my whole adult life trying to help bring this future about, but for those 5 years I got to work on it very directly. I designed, and our team built, an attempt at a prototype of what a new data exploration system could be like, and at the core of this was my attempt at a draft of a language for discussing data the way algebra is a language for discussing math. These are the elements out of which this new age's alchemies will be constituted. And there were moments, as the system began to come into its own, when I felt the twitches of power awakening. You could conjure shapes out of data with this thing. It made information malleable, made it flow.

The computer programmers on the team sometimes referred to the project as a system for "non-programmers", and I've come to think of that as both its potential and its downfall. Programmers never say "non-programmers" as a compliment. At best it's merely condescending, at worst it's a euphemism for "idiot" or a semi-aware admission of incomprehension. For programmers, programming is by definition an end, not a means, and therefore the motivations of non-programmers are inherently mysterious and alien. But what we built was for non-programmers in the same way that a bridge is for non-engineers. That is, the whole point of it was to represent a different interaction model between people and information than the ones offered by, at one end, programming languages, and at the other spreadsheets and traditional database programs. As I said over and over throughout those 5 years, I was trying to get us to do for hyper-connected datasets what VisiCalc once did for columns of numbers. I wasn't trying to simplify; if anything, I was making some things harder, or at least less familiar. This new age is not a subset of a previous age. It is not for lesser people, and its challenges are not of a simpler character.

And as Google now shuts that system down, literally unceremoniously, and 5 years of my work and dreams and visions are at least nominally obliterated, I feel a little sadness but mostly relief. I'm still very convinced that our tools -- humanity's tools -- for interacting with data are hopelessly primitive. I'm still convinced that it won't make a whole lot of difference what those tools are if kids don't grow up learning how to think about data in the first place. I'm still convinced that I have a blurry, fractured vision of what it might take to change these things.

But I also realize two more things.

First, the system we built was only a beginning, and it had hardened into a premature finality long before its official corporate fate was settled. The query language I invented was cool, but the successor to it, which I'm sketching in my head whether I want to or not, is a different sort of thing yet again. And I was never going to reach it incrementally, arguing over every syntax decision on the way. Sometimes you have to just start over. The next one will not aspire to be the Visicalc of anything. It's not better business tools we need. The problem is not that we are alienated from our inner accountants. The thing we need first is not even an algebra of data, probably, but an arithmetic of data. We need an inversion of "normalization" in which you don't write data wrong and then endure six Herculean labors to make it obscurely more pleasing to capricious gods, but rather a way of writing it in the first place with an inherent expressive gravity towards truth because more true is always more powerful. This is a task in applied philosophy, not programming and not engineering and not even science. We need to imagine what Plato would have done when his record collection got too big for his cave.

Second, I still believe that we all deserve better tools, tools more suited for our actual tasks and needs as people whose lives and choices and options are increasingly functions in, not merely of, information. But in the process of exploring what I mean by that I've become a non-non-programmer myself. At my new job I am an engineer. And sometimes, when you think you know what the better world looks like, you can bring pieces of it up out of your dreams. You can walk where the new paths will be. With enough belief, you can walk where the bridges will be. I will come back to these paths, one way or another, but you never do great things by imagining what people you don't understand might want for purposes you don't grasp or embrace. You should trust your own judgment only where you love beyond reason. Anybody could do nearly anything with Needle, and the business cases for it all involved hypothetical big companies doing hypothetical big things with hypothetical big data that repeatedly never actually materialized (and might have been hypoethical if they had). But left to my own invented devices, I always ended up using it for music data.

So I have followed my own love, and my own obsessions, deeper into that data. At my new job, I am trying to make sense of the largest music database in the world, which is a lot more fun than what I was doing before, and harder, and of rather more direct and demonstrable relevance to anything. On my own, I will continue the music projects I started in Needle. The Discordance evolved out of empath, and so I've evolved it back in, with less marginalia but maybe more coherence. For the Pazz & Jop I've built a stats site far more specific than I could ever have done in the generalized environment of Needle. These will grow as I play with them, and probably there will be other things. I spent 5 years trying to build fancy tools, but it's pretty amazing what you can do with just a hammer. I was Needle's most dedicated user, but in the end, both sadly and happily, I don't actually need it any more. Nobody will miss it more than I will, but maybe nobody will really miss it very much. The moral, I think, and maybe even the ethic, is that these systems do not matter. This isn't the first system I worked on only to see it shut down, and it won't be the last. Software is the epitome of ephemera, necessary in aggregate but needless in every mundane specific.

But the things we learn from these systems stay learned. Even the ways of learning remain ways after their original demonstrations disintegrate. This is another phrasing of the point about this Age, in fact: the flow from Data to Information to Knowledge to Wisdom is not a function of syntax or platforms or prevalence or virtualization. It is something we do, to which the technology is merely witness. We must teach our children how to think about data because the data survives where the systems fail. We must teach ourselves to be children again in this new Age, because its most transformative truths still await discovery, and are anything but mundane or needless, and we will never recognize them unless we can recall what it felt like in our hearts when everything was amazing and new and ahead of us, and the act of waking was an invitation to wonder to show us a way.

¶ We Are Made of Water · 27 May 2012 photo

¶ 2011 Music · 19 January 2012 listen

· My version (the long form)
· Other people's version

Metal (ZIP download, 207MB)

1. Blood Stain Child: "Stargazer" (4:21)
2. Elizium: "Nemesis" (5:29)
3. Unleash the Archers: "Realm of Tomorrow" (5:15)
4. Pantheist: "Be Here" (10:47)
5. Subway to Sally: "Nichts ist fur immer" (3:37)
6. Dornenreich: "Fahrte Der Nacht" (4:49)
7. Jesu: "Sedatives" (5:10)
8. Terra Tenebrosa: "Probing The Abyss" (6:03)
9. Thy Catafalque: "Fekete mezõk" (9:20)
10. Lifelover: "Expandera" (3:49)
11. Wolfchant: "Black Fire" (4:20)
12. Avven: "Ros" (3:13)
13. Heretoir: "Fatigue" (7:17)
14. Alcest: "Elévation (Rerecorded)" (13:26)
15. Agrypnie: "Augenblick" (7:28)
16. Frijgard: "Frijgard" (5:39)
17. Dalriada: "Mennyei Harang" (6:16)
18. Arven: "Dark Red Desire" (4:12)
19. Oak Pantheon: "Architect of the Void Pt II" (6:45)
20. Kampfar: "Bergtatt (In D Major - Bonus Track)" (5:28)
21. In Flames: "Sounds Of A Playground Fading" (4:43)
22. Andraste: "Black Birds Of Carrion" (5:16)
23. Nemesea: "Afterlife" (3:12)
24. Leviathan: "Blood Red And True" (6:57)

Non-Metal (ZIP download, 187MB)

1. Airborne Toxic Event: "All At Once" (5:16)
2. Gazelle Twin: "Changelings" (3:23)
3. Joy Formidable: "The Greatest Light Is The Greatest Shade" (5:19)
4. Harris, Emmylou: "Hard Bargain" (3:22)
5. Hatfield, Juliana: "Failure" (3:35)
6. Keene, Tommy: "Already Made Up Your Mind" (3:29)
7. M83: "OK Pal" (3:58)
8. Low: "Nothing But Heart" (8:10)
9. Sounds: "Won't Let Them Tear Us Apart" (4:07)
10. Blondie: "Mother" (3:09)
11. Sloan: "Green Gardens, Cold Montreal" (2:01)
12. The Decemberists: "Calamity Song" (3:50)
13. Mountain Goats: "Damn These Vampires" (3:24)
14. Buckner, Richard: "Escape" (3:32)
15. Roxette: "She's Got Nothing On (But the Radio)" (3:33)
16. Roxette: "Speak to Me" (3:41)
17. Hatfield, Juliana: "Don't Wanna Dance (Brad Walsh Remix)" (3:09)
18. Clarkson, Kelly: "I Forgive You" (3:04)
19. Reik: "No Te Quiero Olvidar" (3:38)
20. Magnum: "Wild Angels" (5:41)
21. Mogwai: "George Square Thatcher Death Party" (3:59)
22. Kerretta: "A Ways to Uprise" (5:29)
23. Emily Bezar: "May In Mesolimbia" (18:21)
24. Ulver: "Stone Angels" (14:53)
25. M83: "Raconte-moi une histoire" (4:07)

¶ Quantifying School · 9 September 2011 essay/tech

[May 2012 note: Needle, the database system I used to collect, analyze and show this information, was acquired and shut down by Google. Thus many of the links below go to non-functional snapshots of Needle pages I took before the shutdown. The points should survive.]

Boston Magazine recently published their annual Best Schools ranking. They've been doing this for years, and are known for various other Boston rankings as well (places to live, places to eat...), so by now you'd expect them to be pretty good at it.

Here's what "pretty good at it" amounts to, in 2011: two lists of 135 school districts, one with some configuration information (enrollment, student/teacher ratio, per-pupil spending, graduation rate, number of sports teams, what kind of pre-k they offer, how many AP classes), the second with test scores, and exactly this much methodological transparency: "we crunched the data and came up with this".

Some obvious things that you can't do with this information:

- sort it by any criteria other than the magazine's rank
- see the stuff in the first table alongside the stuff in the second
- understand which figures are actually part of the ranking, in what weights
- fact-check it
- compare it in bulk to any other information about these schools
- compare it to any other information about the towns served by these districts
- figure out why certain towns were included or excluded
- find out what towns are even meant by non-town district names if you don't already happen to know
- evaluate the significance of any individual factor, or the correlations of any set of them

This is not a proud state of the art. And the quality of secondary journalism around it emphasizes the point further: this article about Salem's low ranking basically just turns a table-row into prose sentences, with no context or analysis, and fails to even realize that the 135 districts in the ranking represent just the immediate vicinity of Boston, not the whole state. This Melrose article claims Melrose "climbed" from 97th last year to 94th, but then has to add a note that last year's ranking was of high schools, not whole districts, and thus not even the same thing. Swampscott exults in making the top 50. Malden fights back at being ranked 119th. But nobody actually knows what the rankings mean or signify, because Boston Magazine doesn't say.

In an attempt to improve this situation a little, I imported these two tables of information into Needle:

· Needle - Boston Public Schools 2011

This in itself was sufficient to unify the two tables and render them malleable, which seems to me like the most basic start. Now at least you can re-sort them yourself, and choose what to look at next to what.

And a little sorting, in fact, quickly reveals some statistical oddities. North Attleborough was listed with an SAT Reading score of 823, which since SAT scores only go up to 800, is very obviously wrong. Some trivial research verifies that this was a typo for 523, and while typos happen in journalism all the time, a typo in data journalism is a dangerous indication that some human has been retyping things by hand, which is never good. (This datum has now been fixed in the magazine's table.)

More interestingly, when you start scrutinizing each district's 5th/8th/10th-grade MCAS scores, you find some surprising skews. Here are the MCAS and SAT scores for Georgetown:

MCAS 5 English: 74
MCAS 5 Science: 54
MCAS 5 Math: 42

MCAS 8 English: 81
MCAS 8 Science: 36
MCAS 8 Math: 51

MCAS 10 English: 92
MCAS 10 Science: 90
MCAS 10 Math: 88

SAT Reading: 570
SAT Writing: 566
SAT Math: 584

Boston Magazine says they "looked within those districts to determine how schools were improving (or not) over time". But that's not what these scores are measuring. These aren't time-slices for a single cohort, these are different tests being given to different kids. If you're interested in history, the Department of Education profile of Georgetown includes annual MCAS results for 2006-2009, and all you have to do is scan down the page to spot the weird anomaly that is 8th grade Science. Every other test has healthy dark-blue bars for "Advanced" scores; but in 8th grade Science virtually no kids managed Advanced scores in any year. This pattern repeats in Wellesley in an even more dramatic fashion. An article from Wellesley Patch explains that their 8th grade science curriculum doesn't cover "space", while the MCAS does. It's an interesting ideological question whether curricula should be matched to the standardized tests, but whatever your opinion on that, it seems clearly misleading to interpret this policy issue as a quality issue.

A little more sorting repeatedly raised another question: why is Cambridge ranked 25th? In virtually every test-score-based sort it falls close to the bottom of the table. In the magazine's ranking, Cambridge comes in ahead of Westford, at #26. But observe these scores for the two:

MCAS 5 English: 59 - 88
MCAS 5 Math: 53 - 86
MCAS 5 Science: 45 - 85
MCAS 8 English: 75 - 95
MCAS 8 Math: 45 - 86
MCAS 8 Science: 34 - 78
MCAS 10 English: 70 - 97
MCAS 10 Math: 77 - 95
MCAS 10 Science: 59 - 94
SAT Reading: 498 - 587
SAT Writing: 493 - 582
SAT Math: 503 - 602
Graduation Rate: 85.2 - 94.6

This doesn't even look close. But then notice these:

Students per Teacher: 10.5 - 14.6
Per-Pupil Spending: $25,737 - $10,697

Cambridge's spending per student is remarkable. It's almost 50% higher than the next highest, which is Waltham at $18,960. The 10.5 students per teacher is also the best ratio of the 135 schools listed, with 115th-ranked Salem in second place with 11. These factors seem like they should matter, and clearly they must be part of the magazine's ranking calculation, but if they're so uniformly not translating to better test scores or graduation rates in Cambridge, does this really make any sense?

At least we ought to be able to say that these, along with the other non-test characteristics in the magazine like the number of sports teams and the number of AP classes, are different sorts of statistics than test scores. This seems increasingly true as you start looking at them in detail. Plymouth is listed as having 94 sports teams, for example. Can you name 94 different sports? I can't, and the Plymouth High School web site only claims they participate in 19. Newton is listed as having 39 AP classes, and Boston as having 155. But there are only 34 AP subjects, so it seems like a pretty safe guess that in these two multi-high-school districts the magazine is adding the totals for each school. It's hard for me to see what that accomplishes.

So for my own interest, at least, I created my own Quant Score, which is calculated like this:

- take all 9 of the listed MCAS scores, drop the lowest one, and sum the other 8
- divide each of the three SAT scores by 3, to put them into a range where they're each worth around twice as much as an individual MCAS score, and add those in
- multiply the graduation rate by 2, to put it into a similar range to the SAT scores, and add that in, as well

These factors are admittedly arbitrary, so you're welcome to try your own variations, but at least I'm telling you exactly what goes into mine, so you know what you're varying against. I deliberately left out all the other descriptive metrics from this calculation, including student/teacher ratio and spending. I then reranked the schools according to these Quant Scores. See the comparison of the magazine's ranking and mine here.

The differences are pretty dramatic. Three schools from outside the magazine's top 20 move into my top 10 (and 2 more from outside the magazine's top 10). The magazine's #s 6 and 8 drop to 28 and 30 in my list. Watertown and Waltham drop from 53 and 54 in the magazine to 100 and 114 in my list. Swampscott will be displeased to see that my re-ranking them sends then back out of the top 50. Malden will probably not be much appeased that I've bumped them up from 119 to 118. Acton and Winchester will be thinking about staging parades for me. And Cambridge (where I live, and where my pre-K daughter will go to school unless we take some drastic action) plunges from 25th to 107th.

But these are not answers, these are more questions. Most obviously: Why? I'm not claiming my Quant Score is definitive in any way, but it measures something, and I'm willing to claim that what it measures is something more coherent than what the magazine's rank measures. So this sets me off on the quest for better explanations, for which we obviously need more data.

Needle is good at integrating data, so I have integrated a bunch of it: per-capita incomes, town populations, unemployment rates, district demographic breakdowns, lunch subsidy percentages and 2010 election results. Some of these apply to towns, not districts, and several districts serve multiple towns, but Needle loves one-to-one and one-to-many relationships the same, so I've done properly weighted multi-town averages. (Don't try that in a spreadsheet.)

And then I started comparing things. Per-pupil spending seems like it ought to matter, but it shows very little statistical correlation to quant scores. Student/teacher ratios, sports-team counts and AP classes also seem like they ought to matter, but the numbers don't support this.

Per-capita income, on the other hand, matters. The percentage of students receiving lunch subsidies matters even more. In fact, this last factor (the precise calculation I used was adding the percentage of students receiving free lunch and half of the percentage of students receiving partially subsidized lunch) is the single best predictor of quant score that I've found so far. This is depressingly unsurprising: poverty at home is hard to overcome: hard enough for individuals, and even harder in aggregate.

With this in mind, then, I ran a quick linear regression of quant score as a strict function of lunch-subsidy percentage, and used that to calculate predicted quant scores for each district. The depressing headline is how small those variations are. In a quant-score range from 1531 to 727, only 10 districts did more than 100 quant points better than predicted, and only 10 districts did more than 100 points worse. If I use the square roots of the lunch-subsidy percentages, instead, only 6 districts beat their predictions by 100, and only 8 miss by 100.

If I toss in town unemployment rates, Democratic vote percentages in the 2010 Senate election, and town per-capita income, I can get my predictions so close that only 1 school did more than 100 points better than expected, and only two did more than 100 points worse. This is daunting precision.

But OK, even if the variations are small, they're there. So surely this is where those aspirational metrics like spending must come into play. Throwing money at students in school may not be able to counteract poverty at home, but doesn't it at least help?

No.

Students per Teacher? No.
AP classes? No.
Percentage of minority students? No.

I'm by no means saying that there isn't an explanation, or more of an explanation, or other factors. But if there are, I haven't found them yet.

But at least I'm trying. And I give you the data so you can try, too. I submit that this is what data journalism should be trying to do. We are trying to find knowledge in data. Secrecy and opaqueness and non-interactivity are counter-productive. It's more than hard enough to find truth even with all the data arrayed in front of us. If there's an equivalent of the Hippocratic Oath for data journalists, it should be that we will endeavor to never make the truth more obscure.

[Space for discussion here.]

[Postscript, 10 September: The more I thought about that 823/523 error, the more I worried that there might be other errors that weren't as obvious, so I used Needle to cross-check all the test-scores against the official DOE figures. Two more were wrong. Manchester Essex's SAT Reading score was 559, not 599, which I'm guessing would lower their #6 magazine rank, perhaps considerably. In my rankings it dropped them from 28 to 31. Ashland's SAT Reading score was also wrong, 531 not 537, but this didn't change their rank in my method. Both corrections moved those schools' scores closer to my predictions.]

[Postscript, 12 September: But charter schools do better relative to expectations, right? Nope.]

¶ Best Scrabble Racks · 26 August 2011

People stop me on the street, with wild looks in their eyes, and ask "What are the best Scrabble racks?!"

Here.