- Are the quality scores from repeat tastings correlated?
- Wine-quality scores for premium wines are not consistent through time
In the first of these posts I presented six datasets in which someone tasted the same wine / vintage combination on two separate occasions. The first table here shows the percentage of the variation among the quality scores that was reproducible. As you can see, no more than 50% of the variation in the scores was reproducible — the rest was what we politely refer to as "random variation".
Taster Jeremy Oliver James Halliday Wine Spectator James Laube Jancis Robinson |
Wine Penfolds Grange Bin 95 Penfolds Grange Bin 95 Cullen Cabernet Sauvignon Merlot Château Lafite-Rothschild 1986 vintage California Cabernets Henschke Hill of Grace Shiraz |
Scores 38 45 19 34 45 23 |
Value 54% 51% 5% 11% 30% 37% |
This does not seem like a particularly good situation. Indeed, it leads me to ask whether other people have found this same result when they have looked at the issue of reproducibility.
It turns out that Robert Ashton asked the same question back in 2012 (see the literature references at the end of the post). He found six published studies that had collected data to investigate reproducibility, five of which I have summarized in the second table.
Study Brien et al. (1987) Lawless et al. (1997) Gawel et al. (2002) Gawel & Godden (2008) Hodgson (2008) |
Group Study 2 Study 3 Study 4 Study 5 Panel CB Panel G Panel PB Panel C Reds Whites Panel Q Panel x |
People 6 / 8 6 / 8 6 / 8 6 / 8 6 8 7 7 ? 571 571 4 4 |
Lowest 15% 12% 3% 31% 1% 10% 1% 0% 24% 15% 17% 0% 14% |
Average 53% 20% 29% 55% 28% 37% 18% 10% 21% 20% 12% 12% 8% |
Highest 83% 35% 58% 100% 58% 72% 64% 48% 96% 94% 94% 59% 80% |
Each of the studies involved several tasters, often grouped into subsets for the study, as indicated in the table. Indeed, one of them was a large-scale study carried out over several years, with several hundred people. So, in the table I have listed the percentage reproducibility for the most successful person in the group ("highest") and the least successful ("lowest"), along with the average reproducibility across all of the participants.
As you can see, these results are perfectly in accord with my own — less than 50% of the variation in wine-quality scores is reproducible, even for experienced tasters. This is true irrespective of whether the wines are tasted on the same day, the same week, or the same year.
This compares very poorly with most other fields in which people make quality judgments. In his paper, Ashton lists 41 studies across six other fields with which he has been familiar (meteorology, business, auditing, personnel management, medicine, and clinical psychology), and cites average reproducibility rates of 49-83% (compared to the 8-55% shown above for wine).
Wine tasting is obviously not an exact activity, at least compared to other fields of human endeavor.
References
Robert H. Ashton (2012) Reliability and consensus of experienced wine judges: expertise within and between? Journal of Wine Economics 7:70-87.
Chris J. Brien, P. May, Oliver Mayo (1987) Analysis of judge performance in wine-quality evaluations. Journal of Food Science 52:1273-1279.
Richard Gawel, Peter W. Godden (2008) Evaluation of the consistency of wine quality assessments from expert wine tasters. Australian Journal of Grape and Wine Research 14:1-8.
Richard Gawel, Tony Royal, Peter Leske (2002) The effect of different oak types on the sensory properties of chardonnay. Australian and New Zealand Wine Industry Journal 17:14-20.
Robert T. Hodgson (2008) An examination of judge reliability at a major U.S. wine competition. Journal of Wine Economics 3:105-113.
Robert T. Hodgson (2009) How expert are "expert" wine judges? Journal of Wine Economics 4:233-241.
Harry Lawless, Yen-Fei Liu, Craig Goldwyn (1997) Evaluation of wine quality using a small-panel hedonic scaling method. Journal of Sensory Studies 12:317-332.
The question of reproducibility of wine scores was addressed by Caltech lecturer (on randomness) Leonard Mlodinow. [*]
ReplyDeleteFrom The Wall Street Journal “Weekend” Section
(November 20, 2009, Page W6):
“A Hint of Hype, A Taste of Illusion;
They pour, sip and, with passion and snobbery, glorify or doom wines.
But studies say the wine-rating system is badly flawed.
How the experts fare against a coin toss.”
URL: http://online.wsj.com/article/SB10001424052748703683804574533840282653628.html
Essay by Leonard Mlodinow
[Bob's aside: I hope your blog readers can find a way to bypass the "pay wall" at The Journal to read his insightful piece.
*https://en.wikipedia.org/wiki/Leonard_Mlodinow ]
I guess it depends on what kind of wine, the score of some old wine maybe cannot be compared, as it affected by the storage condition.
ReplyDelete