furia furialog · New Particles · The War Against Silence · Aedliga (songs) · photography · code · other things     ↑vF
2 March 2009 to 8 November 2008
An adequate computer language allows humans to communicate with machines about machine concerns.  

A good computer language also facilitates communication between humans about machine concerns.  

A great language allows machines to participate in conversations between humans about human concerns.  
 

There are not very many of this last sort. As I've mentioned before, I'm trying to write one. I've been calling it a query language, but I've started to think I shouldn't. It's a language for talking about data-relationships, where most other things called "query languages" are for excerpting data, and the two are different qualitative goals even when the individual tasks end up being logistically similar. I'm trying to do for data-relationships what the system for symbolic algebra did for numbers. Not what algebra did for numbers, thankfully, just what we accomplished by making up a written syntax for expressing algebra compactly and precisely.  
 

So here's just one real-world example from yesterday. We were talking, elsewhere, about how you calculate overall ratings for bands in a large reviews database. The simplest thing is just to average all their ratings. In Thread, my data-relationship language, this is:  

Artist|(.Album.Rating._Average)
 

I.e.: For each artist, get their albums, then get those albums' ratings, then average all the ratings. But this is maybe not the best statistic, as it weights albums proportionally to the number of reviews. Maybe we want to average the ratings for each album, and then average the album-averages to get the artist average. That's a hard sentence for a person to read, and the computer can't read it at all. But in Thread it's just:  

Artist|(.Album.(.Rating._Average)._Average)
 

Run this, though, and you see that the top of the list is dominated by bands with very small numbers of very high ratings. Not really what we're trying to find out. So let's include only bands with at least 25 ratings:  

Artist:(.Album.Rating::#25)|(.Album.(.Rating._Average)._Average)
 

This is better, but maybe not as much better as you'd think. It turns out that there are a number of bands for which a small number of people have written a large number of reviews. Maybe what we really want is to average the ratings for each user, not for each album. That way one person giving the same high rating to 8 different albums counts as 1, not 8. And we'll only consider artists with ratings from 25 different users, not just 25 ratings total. This is:  

Artist:|(.Album.Rating/User::#25.(.group._Average)._Average)
 

Better, but it's still pretty easy to game this by creating new accounts and filing one very high rating from each of them. We can mitigate that, though, by trusting only ratings from users who have rated, say, at least 5 different albums, from at least 3 different artists. That's:  

Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))  

Artist|(.Album.Trusted Rating/User::#25.(.group._Average)._Average)
 

Better again. But there are still a few pretty obscure things at the top of the list. This doesn't prove that the results are flawed, of course, but scrutinizing them, and thinking about the sample-size effects of rating variation at this scale, reveals that the highest and lowest ratings are having pretty dramatic effects. Perhaps it would be smart to toss out the top and bottom 10% of the per-reviewer averages, averaging only the middle 10%. This keeps one perspective-challenged fan or one vengeful ex-bassist from single-handedly jumping the ratings up or down. Thus:  

Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))  

Artist|(.Album.Trusted Rating/User::#25.(.group._Average)#._Trim 10%._Average)
 

The result of this, in fact, is this leaderboard. By these rules Immolation is currently the top-ranked band in the Encyclopaedia Metallum.  
 

The English version of this final formulation is "bands with 25+ reviewers of their full-length albums, counting only reviewers who have filed at least 5 reviews and covered at least 3 bands; scored by averaging the ratings from each reviewer, dropping the top and bottom 10% of these reviewer-averages, and then averaging the remainder". This is a long sentence for people, and a useless sentence for machines, and as long as this is our canonical format, we will be at considerable risk for error every time we retranslate into a computer language. Put this in SPARQL or SQL or MQL, though, and it would be essentially inaccessible to people. So you chose between knowing what you want and not necessarily getting it, or knowing what you're getting but not whether it's what you want.  

I think we have to do better. The human stakes for data-comprehension are approaching critical levels, and our tools have not kept up. Worse, the shiny new tools in the big labs are not ready yet and not even that great.  

So Thread is my own personal attempt at doing better. Could it be the language we could actually share, humans and computers, to talk about data? I can't prove it is yet, and the project in which it's embedded is still working towards its public debut, so you can't make up your own mind yet, either. But for the past couple years I've been using it to talk to computers, and to myself, and even to a few coworkers, and the experience at least gives me hope. I know it's powerful, and I know it's compact.  

Like any language, of course, we'd have to learn it. I make no claims of it being "intuitive", whatever meaning that term might have for a symbolic-reasoning language, nor do I claim it's trivially implemented at scale. It's cryptic in its own particular way, and poses its own technical challenges. But I'm not trying to minimize anybody's absolute difficulty, I'm trying to maximize the ratio of power to difficulty. If, reading those examples above, without a formal tutorial or even an actual diagram of the data model in question, you have at least a sense of what might be going on, then it's at least possible I'm getting somewhere.  
 

[Note from a few days later: in re-reading these queries I actually noticed a methodological error! The first time I did this, I neglected to sort the ratings before trimming the first and last few. That is, I did this:  

Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))  

Artist|(.Album.Trusted Rating/User::#25.(.group._Average)._Trim 10%._Average)
 

where I should have done this:  

Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))  

Artist|(.Album.Trusted Rating/User::#25.(.group._Average)#._Trim 10%._Average)
 

The operative difference is the "#" for sorting right before "._Trim 10%" in the second query, which is what makes the trim function take off the highest and lowest ratings, rather than just the first and last.  

But even this error is kind of my point. The language is a tool for me to talk to myself over time.]
1. The tendency to forget completely about anything for which someone else is vaguely expected to take the next step.  

2. A reluctance to accept that quantifying one's nostalgia does not mitigate its mortality.  

3. A vigilant willingness to challenge abstrusely tangential orthodoxies.  

4. A failure, when not concentrating, to properly aspirate the letter H in the words "humor" and "human".  

5. A fear of widths.  

6. The maintenance of short but meticulous lists of inconclusive evidence for undeniable truths.  

7. Always, or almost, allowing the silent moments at the ends of experiences to complete without crossfade.
Lyra was supervising while I cooked dinner tonight, and I gave her a few chickpeas as I was putting them in a salad, mostly because it's so irresistibly cute to hear her call them "bickies".  

"Moh?", she said, with a cartoonish upward lilt as if she read in a guidebook that that's how you ask for something in grownup. This means "more", in this case "more chickpeas?". Never mind how I know this. Parenting skills.  

"I'm making yummy dinner", I pointed out, reasonably. I'm a fairly reasonable person, which I think she appreciates. Or will, by the time she's 36 or so.  

She considered this for a moment, pressing a tiny finger into the dot of bickish water the last chickpea left behind on her tray, then looked up again, a tiny easy-bake-oven light-bulb clicking on above her head.  

"One?"  

"One? You want one chickpea?" I said. I'm assuming that my habit of asking her for clarification will become decreasingly inane as time goes on. She nodded enthusiastically. You might think, from the time-ordering, that she was answering my question, but I've conducted tests, and it turns out that she nods no matter what you say. The nodding is her answer to the implied question "Do you still want whatever you wanted before?" Which is, to be fair, what most of our questions to her amount to.  

"Well, I can hardly deny you one single chickpea." It's OK to indulge children as long as they understand the careful logic behind your actions. I plucked a chickpea off the top of the salad and centered it precisely on the tray in front of her. "One", I explained, pointing to it for helpful pedagogical emphasis.  

She nodded three or seven times, then picked up the chickpea, crammed her whole fist into her mouth, somehow extracted the chickpea from her grip while her hand was inside her mouth, and then pulled her hand out with that great sweeping flourish she's been working on in case she ends up needing a career in rodeo. I turned back to the stove, wondering whether you can say that you've learned to count "to" one. It's kind of "from" one, really.  

Behind me I heard a small finger tap a plastic tray once, moistly.  

"Five?", she said.
I've been calculating voter-centricity in polls for several years now, so I can't believe I only just thought of the way to re-apply voter-centricity to the things being voted on: Retabulate the album (or whatever) ranking, inverse-weighting each vote by the voter's centricity. I.e., the closer the voter was to the consensus, the less their vote is worth. Then take the ratio of weighted scores to vote-counts, and you get a measure not of popularity, but of cultishness. You probably want to get rid of the albums that got very few votes, but in the 30-voter ILM Metal poll I only had to eliminate albums that got fewer than 3 votes before the results started looking interesting. In the 577-voter Pazz & Jop poll I cut off the albums with 5 votes or fewer, but even the 6- and 7-vote albums are distributed across the score-range pretty well.  

The only real metric of idiot statistics tricks like this is whether you find out anything new by looking at them. In this case, you can make up your own mind. I have named this new stat "kvltosis" (in a combined metal/statistics joke for which possibly I am the entire target-audience), and added it to my Pazz & Jop analysis. If the poll's consensus bores you, perhaps this can be another antidote. (If the poll's consensus thrills you, on the other hand, just mentally invert this list and you have consensus squared...)
If you haven't had enough music-poll stats after this, I also helped tabulate the ILM Metal Poll, and posted even more geekery on the ILM thread discussing the Pazz & Jop results.
If you want to make the case for "improved" searching via the wonders of semantic-web technology, as this blog post and this demo attempt to, you need to make your demo demonstrate something compelling.  

In the blog post announcing that demo, Kingsley suggests "Microsoft" as the query example. As of this moment, doing that query on that demo produces a page of unintelligibly elided URLs, misrendered characters, and random blobs of text that contain the word "Microsoft" in them. The UI opens with this stirring invocation:  

Displaying values and text summaries associated with pattern: (NULL)1
(NULL)1 contains "microsoft" in any property value.

And the first search result begins, and I feel like I have to clarify that I am not making this up, with the words "Mac OS X Leopard" (and then some gibberish that I'm guessing used to be Italian).  

If you do the search "Microsoft" on Google right now, you get some news items about Microsoft, followed by the Microsoft site itself.  
 

But maybe that was just an unfortunate example. So I tried looking for Cyndi Lauper. Google's results for this begin with Cyndi's official site, then the Wikipedia page about her, then her MySpace page. OpenLink's begin with "The Parking Lot 03.09.2007 at SmartLemming.com", again in a page-layout that isn't even funny as a parody of good information design.  

If you want to amuse yourself by trying more examples, I've put up an easy form for running a search on both sites side-by-side:  

cyndi lauper
microsoft
(try your own)  

Be patient with the OpenLink side...  
 

To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed.  

On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?  

And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can.  

Now, did your opinion of the potential of the "semantic web" go up or down during your experience?  
 

[Update: Kingsley responds here, and suggests that "glenn mcdonald" would actually be a better example query. So here you go: glenn mcdonald. Did your opinion change?  

Just to be clear, I think Kingsley is exactly right that we need a universal data browser, and quite possibly right that Virtuoso's underlying technology is capable of being an engine for such a thing. But this thing he's showing isn't a data browser, it's a data-representation browser. It's as if the first web-browser only did View Source. We will no more sell the new web by showing people URIs than we sold the old web by showing them hrefs. Exactly the opposite: we sold the old web by not showing people UL and OLs and TD/TD/TD/TD and CELLPADDING=0. And we'll sell this new web by not showing them meta-schema and triples and reification and inverse-link entailment.]
Described: Excuses for Our Natures to Change (The War Against Silence #510)  

Zipped: 1-43, 44-75  

Playlisted:  

1. In This Moment: Endless Days And Nights (Forever) (4:21)
2. In This Moment: The Underworld (Her Kiss) (4:30)
3. Enslaved: To The Coast (6:25)
4. Trinacria: Part III: Make No Mistake (6:20)
5. Everon: North (5:03)
6. Everon: South of London (4:04)
7. Eluveitie: Inis Mona (4:09)
8. Leviathan: VI-XI-VI (7:09)
9. Septicflesh: Anubis (4:17)
10. Dir en grey: Dozing Green (4:06)
11. Cynic: Evolutionary Sleeper (3:35)
12. Gyöngyvér: Halhatatlan ámok (3:30)
13. In Flames: The Mirror's Truth (2:58)
14. Frightened Rabbit: Head Rolls Off (3:44)
15. Frightened Rabbit: The Twist (3:30)
16. Puressence: Drop Down to Earth (3:11)
17. Sigur Rós: Inní Mér Syngur Vitleysingur (4:05)
18. M83: Graveyard Girl (4:51)
19. Katy Perry: Waking Up In Vegas (3:19)
20. Delays: Love Made Visible (3:58)
21. Ida: The Killers, 1964 (5:18)
22. Bob Mould: The Silence Between Us (3:34)
23. Shearwater: The Snow Leopard (5:08)
24. Asian Kung-Fu Generation: Night Diving (3:01)
25. Deathspell Omega: Chaining the Katechon (22:12)
26. Nightwish: The Escapist (4:59)
27. Frightened Rabbit: It’s Christmas So We’ll Stop (5:27)
28. Mountain Goats: Marduk T-Shirt Men's Room Incident (3:21)
29. Wetnurse: Life At Stake (7:13)
30. Pink: It's All Your Fault (3:52)
31. OLIVIA: Rain (4:27)
32. Uh Huh Her: Wait Another Day (4:01)
33. Mia: Mausen (4:53)
34. Retribution Gospel Choir: What She Turned Into (2:22)
35. Zapruder Point: An Arm & a Leg (1:56)
36. L'Arc~en~Ciel: NEXUS 4 (3:51)
37. Ihsahn: Emancipation (5:27)
38. DragonForce: A Flame for Freedom (5:20)
39. Bob Catley: We Are Immortal (5:44)
40. I Nine: Seven Days of Lonely (3:36)
41. Jewel: Two Become One (3:44)
42. Týr: Gatu Rima (5:38)
43. Grand Magus: Like The Oar Strikes The Water (3:13)
44. Dark Tranquillity: Below the Radiance (3:25)
45. Equilibrium: Blut Im Auge (4:44)
46. Jesu: Blind And Faithless (3:33)
47. Boris: My Neighbor Satan (5:17)
48. Airborne Toxic Event: Sometime Around Midnight (5:03)
49. Gaslight Anthem: The '59 Sound (3:09)
50. Alanis Morissette: Underneath (4:07)
51. Rick Springfield: Saint Sahara (3:58)
52. Cradle of Filth: Stay (4:55)
53. Zapruder Point: Artificial Light (2:47)
54. Puressence: April In July (3:57)
55. Puressence: 3rd Degree (3:21)
56. Belle & Sebastian: (My Girl's Got) Miraculous Technique (4:28)
57. Lucksmiths: Anyone's Guess (2:19)
58. Hundred Reasons: No Way Back (3:35)
59. Trembling Blue Stars: This Once Was An Island (4:10)
60. Hypocrisy: Hatred (4:46)
61. Moonspell: Dreamless (Lucifer and Lilith) (5:16)
62. Candlemass: Lucifer Rising (4:06)
63. Poisonblack: Left Behind (4:45)
64. Monolith Deathcult: Master of the Bryansk Forests (7:13)
65. Metsatöll: Iivakivi (4:19)
66. Dalriada: A Szikla legendája (4:21)
67. Okkervil River: Pop Lie (3:12)
68. Manic Street Preachers: Umbrella (3:34)
69. Wilderness: Silver Gene (4:12)
70. Parts & Labor: Nowheres Nigh (4:36)
71. Magnetic Fields: Drive On, Driver (2:51)
72. Killers: Human (4:05)
73. Niyaz: Feraghi-Song of Exile (5:45)
74. Soweto Gospel Choir: Pride (In the Name of Love) (2:36)
75. Garry Schyman: Praan (4:29)
Here's a quick, simple test for your "news" source: Is their presentation of an abject historical humanitarian crisis with mounting casualties any different from their treatment of a guy hanging upside down from a ski-left by his pants?  

(Note: The answer should be "yes".)
My daughter has just discovered indefinite articles. For the past couple days, every "Mommy" and "Daddy" and "Kitty" has turned into "A mommy!", "A daddy!", "A kitty!" I think it's probably overanalyzing to think that she has just grasped either existential quantification or classification, but clearly she means something. She has been meaning things for a while, too, of course, but it's possible that this is her first communicated abstract idea. And even if this isn't, yet, it makes me realize that something soon will be.  

It's easy enough to see that a baby is a small person, taxonomically, because that's what they basically look like. But it's quite another leap to comprehend that an individual baby is actually inexorably becoming an individual person.
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.