Monday, September 3, 2018

The poor mathematics of wine-quality scores

Language is all about communication. If you say something and I don't understand it, then we are both wasting our time.1 For example, a Swede and I were looking at a book the other day, in which one of the female characters referred to another one as "duck". My colleague interpreted this as insulting, because it would be so if you did the same thing in Swedish (using the word "anka"). However, it seemed to me that "duck" and "ducky" were precisely the sorts of expressions that buxom barmaids used to use in those 1960s British television shows that I saw as a child, in which it was a friendly term. It turned out that the book was set in London, and published in 1958, so that was the correct interpretation.

Celia Fremlin — The Hours Before Dawn

This sort of variation in interpretation is one reason why word descriptions of wines are frequently disparaged, because it is often rather difficult to work out what all of this flowery language is supposed to mean (see my post on Wine writing, and wine books). In turn, this dissatisfaction is one reason why wine-quality scores are popular, because mathematics is supposed to make communication pedantically precise.

For example, in the early 1900s Jacques Futrelle created the character Professor van Dusen (also known as The Thinking Machine), who solved mysteries by the remorseless application of logic. His mantras was: "Two and two equal four, not just some of the time but all of the time".2 This emphasizes the ultimate goal of mathematics as a language, that we cannot go wrong — a mathematical proof of a proposition is as close as we can get to certainty.

However, to use of this advantage we must be rigorous, and be pedantic about our intention, as well. This is often problematic, because most people manage to forget all of the mathematics they were taught at school, within minutes of leaving that school for the last time.3 For example, you should all recognize that van Dusen is actually wrong, because the assumption that the sum uses base10 is not warranted, and in base2: 1 + 1 = 10. 4

So, wine-quality scores will only be at their best as a means of communication if they follow the logic of mathematics. Sadly, they rarely do.

Augustus S.F.X. van Dusen

Best-case scenario

The first widely applied wine-scoring system was the 20-point scale developed in the 1950s by Maynard Amerine and his colleagues at the University of California, Davis. In this scheme, each organoleptic characteristic of the wine is assigned a number of points based on its perceived quality, and these points are summed to produce the final score. The wine characteristics include: appearance, color, aroma and bouquet, total acidity, sweetness, body, flavor, bitterness, and astringency,

In theory, everyone who uses the UCDavis scale should be "speaking the same language"; and therefore any differences in wine scores should represent differences in perceived wine quality, not differences in the use of language. However, in practice, this will be true only if summing the sub-scores makes mathematical sense — if this is not so, then the sum is not a repeatable mathematical quantity.

For the sum to make any mathematical sense, each possible quality point has to mean exactly the same thing as every other possible quality point. That is, a point for color has to mean exactly the same thing as a point for astringency; if not, then adding the points has no precise mathematical meaning. Furthermore, every user has to mean exactly the same thing when they assign each quality point. It is like counting apples — we all need to agree on what an apple is, and then we need to be able to recognize each apple when we see one; if I try to add 3 apples and 3 blackberries, the sum of 6 may not make much sense.

Is this required uniformity of the points likely to be true in the case of wine-quality assessment? I doubt it, although I would love to have someone demonstrate that I am wrong. This means that, even in this situation, where the input to the quality score is pre-specified as the sum of a set of parts, the mathematics is not helping us communicate as much as we would like. And this is the best-case scenario!

Maynard Amerine (left) and Edward Roessler (right)

Usual scenario

On the other hand, most wine commentators do not use any such scoring scheme. Their wine-quality scores are personal to themselves. That is, the best we can expect from each commentator is that their wine scores can be compared among themselves, so that we can work out which wines they liked and which ones they didn't. However, the scores cannot be compared between commentators at all.

I have shown this unfortunate situation in two main blog posts, where I directly compared the scoring systems of several professional commentators for the same wines:
No-one's scores were the same as anyone else's, irrespective of what set of points they used. So, differences between scores could mean differences in wine quality, but they could just as easily reflect differences in the interpretation of the numbers. In this case, the numbers are no better than are words as a means of communication. It is like watching someone from Scotland trying to talk with someone from Texas — they may ostensibly be using the same language (English), but they may also find that their communication is an uphill battle.

Jancis Robinson, when she was a bit closer to her maths days

Conclusion

Obviously, we shouldn't conclude from this that points are pointless. But we might conclude that the sometimes-heard argument that numbers are more precise than words does not really apply in the case of wine-quality assessments. Even in the best-case scenario, where sets of points are added together to produce the score, might make little mathematical sense.

I believe that this is the main reason why the best-known mathematically trained wine commentator has repeatedly said that she is wary of assigning quality points to the wines that she tastes. This is Jancis Robinson, who has a degree in mathematics (and philosophy) from the University of Oxford. As far as I know, she has never put it this way, but she could do so with perfect assurance: wine assessment is no better using numbers than words, because the numbers violate many of the mathematical requirements for a precise language.

Postscript

Tom Wark has posted a very intersting response to my comments over at his Fermentation blog: A wine rating is an adjective, not a calculation. In one sense he does not disagree with my conclusion, but instead disputes the premise that numbers must be treated as calculations. My reply (as posted on his blog), is to question "why, given that the numbers are not mathematics, we are using mathematical language in our attempt to communicate. To me, this is like using English words without creating English sentences! So, if a wine rating is an adjective, then don’t use a number, because this is a very poor substitute for an adjective."



1 Unless I happen to like the sound of your voice! I have long thought that one reason the English have traditionally disliked the French and the Scots is that, for spoken English, both groups have accents that are much more melodious than any of the numerous English ones.

2 This idea goes back a long way. For example, in Johann Wigand's De Neutralibus et Mediis Libellus (1562) we find: "That twice two are four, a man may not lawfully make a doubt of it, because that manner of knowledge is grauen [graven] into mannes [man's] nature."

3 See Why do Americans stink at math?

4 We expect numbers to be in base10 because we have 10 fingers, but any base is actually possible, and to be pedantic we should always specify which one we are using. Computers, for example are binary, and thus use base2, while computer programming often uses octal, which is base8, or hexadecimal, which is base16. Given the number of devices in the modern world that have a computer processor in them, base2 is almost as common as base10 these days.

4 comments:

  1. Excerpt from Slate
    (Posted June 15, 2007):

    “Cherries, Berries, Asphalt, and Jam.
    Why wine writers talk that way.”

    URL: http://www.slate.com/articles/life/drink/2007/06/cherries_berries_asphalt_and_jam.html

    By Mike Steinberger
    “Drink: Wine, beer, and other potent potables” Column

    . . .

    In his book “The Taste of Wine,” legendary French oenologist Emile Peynaud elegantly explained the conundrum. "We tasters feel to some extent betrayed by language," he wrote. "It is impossible to describe a wine without simplifying and distorting its image." This linguistic failure is surely one reason that numerical scores for wines have proven so popular; points are simplistic and distorting, too, but they at least give you something to hold onto -- more so than, say, "spice box," "melted asphalt," or "liquefied minerals."

    So, how did such phrases become standard-issue wine nomenclature? We can trace it back to a revolution in winespeak that took place three decades ago. In 1976, two University of California, Davis professors, Maynard Amerine and Edward Roessler, published a book titled “Wines -- Their Sensory Evaluation.” A dense, bone-dry monograph stuffed with mathematical equations, the book touched on many subjects, but it was the chapter devoted to the vocabulary of wine that ultimately wielded the most influence. At the time, wines were generally evaluated anthropomorphically and tended to be described as masculine or feminine, coarse or refined, noble or common, ingratiating or overbearing.

    Amerine and Roessler proposed that oenophiles abandon this vague terminology, rooted in the British class system, in favor of a more rigorous lexicon that treated wines not as living creatures with personalities but as agricultural products with precise flavors and aromas. Other researchers, notably fellow UC Davis professor Ann Noble (creator of the famous Wine Aroma Wheel), refined this new diction. Raiding the garden and the kitchen pantry, they prescribed a new, food-based nomenclature, in which wines were to be described as evoking specific fruits, vegetables, nuts, flowers, and the like.

    Although anthropomorphic language all but disappeared from the academic literature, mainstream wine writers continued to make abundant use of gender- and class-based metaphors. But many wine critics also started to employ a very specific, largely pastoral vocabulary. In 1978, Robert Parker began publishing The Wine Advocate, and although Parker has never shied away from slippery adjectives (he often uses words like hedonistic, sexy, and intellectual), his tasting notes have always stood out for their no-nonsense, just-the-flavors-ma'am approach. Here's Parker, for instance, on the 1996 Chateau d'Yquem (the great sweet wine of Bordeaux): "[l]ight gold with a tight but promising nose of roasted hazelnuts intermixed with crème brûlée, vanilla beans, honey, orange marmalade, and peach … "

    Over the last two decades or so, this type of tasting note has become the industry standard; most critics nowadays make a point of listing the exact aromas, flavors, and tactile sensations they perceive in a wine. These grab bags of specific and often obscure tastes and scents breed a certain awe and deference among many wine enthusiasts (Gee, he really must be gifted if he can smell all those things -- I should heed his recommendations), which is undoubtedly part of their appeal. Wine writers perhaps also feel pressured to use the "right" lingo for fear of losing street cred in the eyes of their peers and other industry insiders. But while the cherry-and-berry imagery may be good for establishing critical authority, its value to the layman is open to debate.

    . . .

    ReplyDelete
  2. Excerpt from Slate
    (Posted June 15, 2007):

    “Cherries, Berries, Asphalt, and Jam.
    Why wine writers talk that way.”

    URL: http://www.slate.com/articles/life/drink/2007/06/cherries_berries_asphalt_and_jam.html

    By Mike Steinberger
    “Drink: Wine, beer, and other potent potables” Column

    . . .

    One of the more famous assaults on the new language of wine came from novelist and children's writer Roald Dahl, a renowned oenophile himself. In 1988, he wrote a letter to Britain's Decanter magazine in which he lambasted as "tommyrot" the "extravagant, meaningless similes" that were suddenly being used to describe wines. "Wine … tastes primarily of wine -- grape-juice, tannin, and so on," Dahl wrote. "If I am wrong about this, and the great wine-writers are right, then there is only one conclusion. The chateaux in Bordeaux have begun to lace their grape-juice with all manner of other exotic fruit juices, as well as slinging in a bale or two of straw and a few packets of ginger biscuits for extra flavouring. Someone had better look into this." He went on, "I wonder, by the way, if these distinguished persons know that their language has become a source of ridicule in many sensible wine-drinking households. We sit around reading them aloud and shrieking with laughter."

    . . .

    ReplyDelete
  3. Excerpt from Jancis Robinson, MW Website
    (circa 2002):

    “How to Score Wine”

    URL: http://www.jancisrobinson.com/articles/how-to-score-wine?layout=pdf

    Imagine going to an art gallery and being asked to fill in a form assigning scores to each work. It does sound pretty difficult and of questionable use, does it not?

    Yet the process of scoring wine, one which many of us engage in frequently, is not that far removed from assigning points to a Picasso or a De Kooning.

    I would be much happier in my professional life if I were never required to assign a score to a wine. I know so well how subjective the whole business of wine appreciation is and, perhaps more importantly, how much the same wine can change from bottle to bottle and week to week, if not day to day. I frequently find myself re-tasting a wine at the same stage in its life. So far I have rarely marked more than 0.5 points out of 20 differently on the two occasions, but it wouldn't surprise me at all if I did.
    And as for tasting the same wine at different stages in its life, this is even less likely to yield identical scores. Quite apart from bottle variation there are differences in tasters' moods and vast differences in how wines mature in bottle.

    Even I have to admit, however, that scores have their uses. The most obvious is to help the reader-in-a-hurry . . .

    I find myself using all sorts of different scoring systems depending on the circumstances. . . .

    In most of my tasting and writing I don't really need scores. What's important when I taste a range of mixed wines is to mark those I think good enough -- which often translates into sufficiently good value (for most us price is important) -- to recommend. A mere tick suffices. An exclamation mark draws my attention to something notable such as an absurdly hyberbolic claim on the back label or some strange new phenomenon. . . . 'GV' distinguishes the seriously good value bottles while real stinkers get a cross to bear.

    I like the five-star system used by Michael Broadbent and “Decanter” magazine. Wines that taste wonderful now get five stars. Those that will be great may be given three stars with two in brackets for their potential. But Brits being as polite, or just plain cowardly, as we are, almost all the wines get between three and five stars in Decanter so it's not an especially nuanced scoring system - although I have been known to use it for wines likely to be very close together in quality such as de luxe Champagnes or mature vintage Ports.

    When even I have to admit that I really need a numerical scoring system is when tasting a wide range of wines of the same sort when readers, or subscribers to jancisrobinson.com, need a shorthand reference to my favourite wines. . . .

    I know that Americans are used to points out of 100 from their school system so that now they, and an increasing number of wine drinkers around the world, use points out of 100 to assess wines. Like many Brits, I find this system difficult to cope with, having no cultural reference for it.

    So, I limp along with points and half-points out of 20, which means that the great majority of wines (though by no means all) are scored somewhere between 15 and 18.5, which admittedly gives me only eight possible scores for non-exceptional wines -- an improvement on the five star system but not much of one. (I try when tasting young wines to give a likely period when the wine will be drinking best, so I do cover the aspect of its potential for development.)

    But, perhaps strangely for someone who studied mathematics at Oxford, I'm not a great fan of the conjunction of numbers and wine. Once numbers are involved, it is all too easy to reduce wine to a financial commodity rather than keep its precious status as a uniquely stimulating source of sensual pleasure and conviviality.

    ReplyDelete
  4. Some 100 point wine scoring scales don't even address the issue of assessing a wine's technical "characteristics" a.k.a. components.

    Example: Wine Spectator magazine.

    Quoting from the magazine's March 15, 1994 issue (“Letters” section, page 90):

    Grading Procedure

    In Wine Spectator, wines are always rated on a scale of 100. I assume you assign values to certain properties [a.k.a. components] of the wines (aftertaste, tannins for reds, acidity for whites, etc), and combined they form a total score of 100. An article in Wine Spectator describing your tasting and scoring procedure would be helpful to all of us.

    (Signed)

    Thierry Marc Carriou
    Morgantown, N.Y.


    Editor’s note: In brief, our editors do not assign specific values to certain properties [a.k.a. components] of a wine when we score it. We grade it for overall quality as a professor grades an essay test. We look, smell and taste for many different attributes and flaws, then we assign a score based on how much we like the wine overall.

    ReplyDelete