Monday, October 22, 2018

Repeatability of wine-quality scores

I have written a couple of times about the ability of experienced wines tasters to reproduce their quality scores when tasting the same wine two or more times:

In the first of these posts I presented six datasets in which someone tasted the same wine / vintage combination on two separate occasions. The first table here shows the percentage of the variation among the quality scores that was reproducible. As you can see, no more than 50% of the variation in the scores was reproducible — the rest was what we politely refer to as "random variation".

Taster
Jeremy Oliver

James Halliday
Wine Spectator
James Laube
Jancis Robinson  
Wine
Penfolds Grange Bin 95
Penfolds Grange Bin 95
Cullen Cabernet Sauvignon Merlot
Ch√Ęteau Lafite-Rothschild
1986 vintage California Cabernets
Henschke Hill of Grace Shiraz
Scores
38
45
19
34
45
23
Value
54%
51%
5%
11%
30%
37%

This does not seem like a particularly good situation. Indeed, it leads me to ask whether other people have found this same result when they have looked at the issue of reproducibility.

It turns out that Robert Ashton asked the same question back in 2012 (see the literature references at the end of the post). He found six published studies that had collected data to investigate reproducibility, five of which I have summarized in the second table.

Study
Brien et al. (1987)



Lawless et al. (1997)



Gawel et al. (2002)
Gawel & Godden (2008)

Hodgson (2008)
  
Group
Study 2
Study 3
Study 4
Study 5
Panel CB
Panel G
Panel PB
Panel C

Reds
Whites
Panel Q
Panel x
People
6 / 8
6 / 8
6 / 8
6 / 8
6
8
7
7
?
571
571
4
4
Lowest
15%
12%
3%
31%
1%
10%
1%
0%
24%
15%
17%
0%
14%
Average
53%
20%
29%
55%
28%
37%
18%
10%
21%
20%
12%
12%
8%
Highest
83%
35%
58%
100%
58%
72%
64%
48%
96%
94%
94%
59%
80%

Each of the studies involved several tasters, often grouped into subsets for the study, as indicated in the table. Indeed, one of them was a large-scale study carried out over several years, with several hundred people. So, in the table I have listed the percentage reproducibility for the most successful person in the group ("highest") and the least successful ("lowest"), along with the average reproducibility across all of the participants.

As you can see, these results are perfectly in accord with my own — less than 50% of the variation in wine-quality scores is reproducible, even for experienced  tasters. This is true irrespective of whether the wines are tasted on the same day, the same week, or the same year.

This compares very poorly with most other fields in which people make quality judgments. In his paper, Ashton lists 41 studies across six other fields with which he has been familiar (meteorology, business, auditing, personnel management, medicine, and clinical psychology), and cites average reproducibility rates of 49-83% (compared to the 8-55% shown above for wine).

Wine tasting is obviously not an exact activity, at least compared to other fields of human endeavor.

References

Robert H. Ashton (2012) Reliability and consensus of experienced wine judges: expertise within and between? Journal of Wine Economics 7:70-87.

Chris J. Brien, P. May, Oliver Mayo (1987) Analysis of judge performance in wine-quality evaluations. Journal of Food Science 52:1273-1279.

Richard Gawel, Peter W. Godden (2008) Evaluation of the consistency of wine quality assessments from expert wine tasters. Australian Journal of Grape and Wine Research 14:1-8.

Richard Gawel, Tony Royal, Peter Leske (2002) The effect of different oak types on the sensory properties of chardonnay. Australian and New Zealand Wine Industry Journal 17:14-20.

Robert T. Hodgson (2008) An examination of judge reliability at a major U.S. wine competition. Journal of Wine Economics 3:105-113.

Robert T. Hodgson (2009) How expert are "expert" wine judges? Journal of Wine Economics 4:233-241.

Harry Lawless, Yen-Fei Liu, Craig Goldwyn (1997) Evaluation of wine quality using a small-panel hedonic scaling method. Journal of Sensory Studies 12:317-332.

2 comments:

  1. The question of reproducibility of wine scores was addressed by Caltech lecturer (on randomness) Leonard Mlodinow. [*]

    From The Wall Street Journal “Weekend” Section
    (November 20, 2009, Page W6):

    “A Hint of Hype, A Taste of Illusion;
    They pour, sip and, with passion and snobbery, glorify or doom wines.
    But studies say the wine-rating system is badly flawed.
    How the experts fare against a coin toss.”

    URL: http://online.wsj.com/article/SB10001424052748703683804574533840282653628.html

    Essay by Leonard Mlodinow

    [Bob's aside: I hope your blog readers can find a way to bypass the "pay wall" at The Journal to read his insightful piece.

    *https://en.wikipedia.org/wiki/Leonard_Mlodinow ]

    ReplyDelete
  2. I guess it depends on what kind of wine, the score of some old wine maybe cannot be compared, as it affected by the storage condition.

    ReplyDelete