Monday, December 31, 2018

Is there truth in (wine) numbers?

Everyone knows the expression in vino veritas (in wine there is truth), which (in one form or another) seems to date all the way back to the 6th century BCE. However, in this blog, I spend a lot of time looking at numbers. This immediately raises the oft-asked question of whether "truth" also lies in numbers. In this post I will look at four informative examples where there is truth in some wine numbers, but in each case all is not quite as it seems.

Introduction

Many people are wary of numbers. The issue is that truth lies not in the numbers themselves but in our ability to interpret them. Numbers cannot speak for themselves, and thus they can tell us nothing directly. We have to look at them and work out for ourselves what truth lies therein.

The same applies to words, of course. The same combination of letters can mean quite different things, in different contexts or in different places. Even in English the words "lead" (pronounced leed) and "lead" (pronounced led) look very similar but have different meanings. (Did you know that this is why the band Led Zeppelin spelled their name that way? That was how they wanted it to always be pronounced, as the name comes from the expression "going down like a lead zeppelin".)

We have to be aware of this sort of thing, if we are to make much sense of the world around us; and we all get it wrong more often than we would like. Having so many different languages only makes it much worse, of course.

It is the same with numbers, even though there is only one mathematical language. This is why Mark Twain famously referred to "Lies, damned lies, and statistics". The first two emphasize the problems with words, and the third one the problem with numbers. It is easy to fool ourselves when interpreting numbers, and to thereby intentionally or unintentionally mislead others.

I mention this because I recently encountered four different examples of misinterpreting numbers in the wine industry, in a way that lead to wrong conclusions, even though the numbers were (almost all) truthful. The first example comes from a book, the second from a blog post, the third from a press release, and the fourth comes from a research paper. Numbers are everywhere!

Example 1

An easy one to start with. This table is from a book about Madeira wine. It discusses the wine production from each of the main grape varieties. Back when I taught experimental design to university students, I used examples just like this one to drill into those students the importance of presenting numbers correctly in tables. Can you spot the error?

Any time you see a table where the numbers are supposed to add up to a given total, check whether they do — you might be surprised how often they don't (eg. see the Postscript.) In this case, the Production data for Other European Varieties cannot possibly be right, although the Percentage of total harvest is apparently correct. Working backwards from the Total given, the true Production should be 38,936.05 hL, not 39.04.

Note that this is similar to a typographical error, but of a somewhat complex type, and with important consequences.

Example 2

Now let's look at a slightly more tricky instance. Some years ago, a retailer blog post from Australia contained this comment:
I could not help but be struck by how many wines the tasters rated at 90 points or more on a scale where the maximum is 100. An analysis of the list of 365 wines (excluding the French champagnes - all of which were rated above 90) showed that the average score given to this range was 91.36 and the lowest score given to a wine was 83. Of the 211 reds listed, 83 were rated as 93 points or higher. Now under the 20 point system used in Australian wine shows, 18.5 points is gold medal standard. Multiply by five to get 92.5 and it seems that almost 40 per cent of all the reds (83 out of 211) are gold medal standard, and every red and white wine on their list is well above the minimum 15.5 out of 20 (or 77.5 out of 100) needed to gain a bronze medal.
The author's conclusion does, indeed, follow if his arithmetic is right; but it isn't right. The issue here is converting from one scale (20 points) to another (100 points). The arithmetic assumption that the author makes is that both scales start at 0, whereas the 100-point scale actually starts at 50. These different equivalences are compared in this table:
 20-pointscale 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 20 0 = 0 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 27.5 30 32.5 35 37.5 40 42.5 45 47.5 50 52.5 55 57.5 60 62.5 65 67.5 70 72.5 75 77.5 80 82.5 85 87.5 90 92.5 95 97.5 100 0 = 50 50 51.25 52.5 53.75 55 56.25 57.5 58.75 60 61.25 62.5 63.75 65 66.25 67.5 68.75 70 71.25 72.5 73.75 75 76.25 77.5 78.75 80 81.25 82.5 83.75 85 86.25 87.5 88.75 90 91.25 92.5 93.75 95 96.25 97.5 98.75 100 Showmedals Bronze Silver Gold

Allowing for the fact that 0 on the 20-point scale equals 50 on the 100-point scale does away with the author's concern about over-inflation of scores, because a Gold medal requires 96 points, not 93 points, and not all of the wines would get a Bronze medal (which requires 89 points not 77.5).

However, even this simple correction does not necessarily produce the "correct" conversion from 20 points to 100 points. For example, Australia's Winestate magazine has used a conversion where 15.5 points on the 20-point scale is equivalent to 90 points on the 100-point scale, not 89 points (as shown in the table).

Example 3

This example takes a lot of work to identify the source of the error.

In 2015, the climats and terroirs of the wine region of Burgundy were added to the UNESCO World Heritage List. To quote UNESCO: "The climats are precisely delimited vineyard parcels on the slopes of the Côte de Nuits and the Côte de Beaune south of the city of Dijon. They differ from one another due to specific natural conditions (geology and exposure) as well as vine types and have been shaped by human cultivation. Over time they came to be recognized by the wine they produce ... The site is an outstanding example of grape cultivation and wine production developed since the High Middle Ages."

The UNESCO documentation suggests that there are 1,247 climats in this World Heritage site. However, Paul Messerschmidt (an amateur wine researcher from the UK) (paulmess[at] gmail.com) noted a discrepancy between this number and the count of those actually listed in the UNESCO documentation.

After a lot of (tedious) work, he realized that, while there are 1,247 climat names in Burgundy, there are actually "1,628 separate, distinct, and precisely delimited vineyard parcels in the Côte d'Or". The difference appears to come from searching the database for "climat names" rather than "named climats". For example, "there are vineyards called Les Cras in Chambolle-Musigny, Vougeot, Aloxe-Corton, Pommard, and Meursault (ie. five "named climats"), but they share only one "climat name", as listed in UNESCO's count of 1,247". The difference of 381 vineyards is hardly trivial, especially if you happen to own part of one of them.

Paul is apparently now compiling the discrepancies between the UNESCO list and those of The Wines of Burgundy, by Sylvain Pitiot & Jean-Charles Servant, and Inside Burgundy, by Jasper Morris, if anyone wants to help him with his work.

Example 4

Let's return to the subject of scoring wines at a wine show, and awarding medals. This will illustrate a situation where we can easily be mislead when dealing with statistical summaries.

There are a number of research papers where judge scores have been compiled, and I will illustrate my point with a paper in the Journal of Wine Research (1996, 7:83-90). In this case, the judges evaluated 174 wines, and this graph shows the scores for three of the judges (each vertical bar represents the number of wines that received each of the scores shown on the horizontal axis):

One standard way to summarize data like this, and thus to compare the judges, is to calculate the mean score for each judge. In this case, the mean for Judge 3 is 11.8 and for Judge 5 it is 11.3. These two means are almost identical, suggesting that the judges are rather similar, and yet their scores, as shown in their graphs, are quite different. Indeed, Judge 3 seems to have two main groups of scores that are favored, with a score of 12 not commonly being used — and yet this is actually the mean score (11.8)! In this case, the mean does not help us understand the scoring behavior of this judge.

This is even more obvious when we look at the data for Judge 1. Once again, there are scores that the judge rarely uses, such as 9 and 10, and yet the mean score is 9.8. When the data have two distinct patterns, we call it "bi-modal", and in such a case the calculation of any sort of average score is going to mislead us badly. The data need to be clustered around the mean, if the mean is going to tell us anything useful.

Conclusion

So, remember that truth lies not in the words or numbers, but in our ability to interpret them. Compare this with the situation of a medical doctor diagnosing a disease based on the patient's symptoms. The symptoms really do indicate the disease, and hopefully the doctor extracts the truth most of the time. However, sometimes the doctor is unfamiliar with the disease, and sometimes the doctor misinterprets the symptoms, and sometimes the doctor fails to connect the symptoms with the disease. This is not good for the patient, or the doctor for that matter; but they both need to deal with it.

As a final word example, Swedes have an expression for couples living together, which is "samma boende", which they shorten to "sambo" (the first letters of each word). Americans do not introduce their partner as their sambo, but Swedes quite happily do so. This confuses Americans, but not Swedes.

Postscript

How many of you have ever noticed the error in a widely distributed description of the original UC Davis 20-point wine score card (as pointed out to me by Bob Henry)? The description is: "Appearance (2), Color (2), Aroma & Bouquet (4), Volatile Acidity (2), Total Acidity (2), Sugar (1), Body (1), Flavor (1), Astringency (1), and General Quality (2)". The true numbers should be: Flavor = 2, Astringency = 2, so that scores then correctly sum to 20. [I warned you to always check totals!] The written description also refers to a "fairy wine", which would be a very interesting thing, if it existed.

1. "The author's conclusion does, indeed, follow if his arithmetic is right; but it isn't right. The issue here is converting from one scale (20 points) to another (100 points). The arithmetic assumption that the author makes is that both scales start at 0, whereas the 100-point scale actually starts at 50. ..."

Thank you. A long overdue addressing of the fact that wine scoring scales for some reviewers do not start at zero.

As Robert Parker observed in his 1989 interview with Wine Times magazine (later to become Wine Enthusiast magazine):

"WINE TIMES: How is your scoring system different from The Wine Spectator's?

"PARKER: Theirs is really a different animal than mine, though if someone just looks at both of them, they are, quote, two 100-point systems. Theirs, in fact, is advertised as a 100-point system; mine from the very beginning is a 50-point system. IF YOU START AT 50 and go to 100, it is clear it's a 50-point system, and it has always been clear. Mine is basically two 20-point systems with a 10-point cushion on top for wines that have the ability to age. ...

". . . I thought one of the jokes of the [UC Davis and British] 20-point systems is that everyone uses half points, so it's really a 40-point system -- which no one will acknowledge -- and mine is a 50-point system, and in most cases a 40-point system."

2. Readers of this blog should go back and re-read this entry:

"What happened to Decanter when it changed its points scoring scheme"

URL: http://winegourd.blogspot.com/2017/06/what-happened-to-decanter-when-it.html

3. This comment has been removed by a blog administrator.

4. Adding to the discussion . . .

"The Odds Are You're Innumerate"
New York Times online - Jan 1, 1989
URL: https://www.nytimes.com/1989/01/01/books/the-odds-are-you-re-innumerate.html

-- and --

"Why Do Americans Stink at Math?"
New York Times Magazine online - July 26, 2014
URL: https://www.nytimes.com/2014/07/27/magazine/why-do-americans-stink-at-math.html

Let's introduce some levity into the discussion . ..

"New Math"
Tom Lehrer in concert - 1965