One interesting question, then, is how should this consensus ordering be achieved; and do different methods consistently produce different results?
At the bottom of this post I have listed a small selection of the professional literature on the subject of ranking wines. In the post itself, I will look at some data on the subject, ranking the wines in two different ways.
The data I will look at come from the Vintners Club. This club was formed in San Francisco in 1971, to organize weekly blind tastings (usually 12 wines). Remarkably, the club is still extant, although the tastings are now monthly, instead of weekly. The early tastings are reported in the book Vintners Club: Fourteen Years of Wine Tastings 1973-1987 (edited by Mary-Ellen McNeil-Draper. 1988).
The Vintners Club data consist of three pertinent pieces of information for each wine at each tasting:
- the total score, determined by summing each taster's ranking (1-12) of the wines in descending order of preference (1 is most preferred, 12 is least preferred)
- the average of the UCDavis points (out of 20) assigned by each taster — the Vintners Club has "always kept to the Davis point system" for its tastings and, therefore, averaging the scores is mathematically valid
- the number of tasters voting for the wine as 1st place (and also 2nd and 12th).
For my dataset, I chose the results of the 45 "Taste-offs" of California wine. These tastings were the play-offs / grand finals (depending on your sporting idiom), consisting of the first- and second-place wines from a series of previous tastings of the same grape varieties. The Vintners Club apparently began its annual Taste-off program in 1973, and has pursued the concept ever since.
In my dataset, there are 14 Taste-offs for cabernet sauvignon, 12 for chardonnay, 9 for zinfandel, 4 for pinot noir, 3 for riesling, and one each for sauvignon blanc, gamay, and petite sirah. There were 17-103 people attending each the 45 Taste-offs (median 56 people per tasting), of whom 43-96% submitted scores and ranks (median 70%).
For each tasting, I calculated the Spearman correlation between the rank-order of the wines as provided by the total scores and the rank-order of the wines as provided by the average Davis points for each wine. This correlation provides a measure (scale: 0-100%) of how much of the variation in ranks is shared by the two sets of data (total scores versus average points). The percentage is thus a measure of agreement between the two rankings for each tasting.
Total scores and average points
The graph shows the results of the 45 tastings, with each point representing one of the Taste-offs. The horizontal axis represents the number of people providing scores for that tasting, while the vertical axis is the Spearman correlation for that tasting.
As you can see, in most cases the correlation varies from 50-100%. However, only 1 in every 5 times is the correlation above 90%, which is the level that would indicate almost the same ranking for the two schemes. So, we may conclude that, in general, the total score and the average points do not usually provide the same rank-order of the wines at each tasting.
Indeed, in two cases the two schemes provide very different rank-orders for the wines, with correlations of only 41% and 23%. This is actually rather surprising. These two tastings both involved chardonnay wines, for some reason.
It is a moot point whether to sum the ranks or average the scores. That is, we cannot easily claim that one approach is better than the other — they produce different results, not better or worse results. However, for both approaches there are technical issues that need to be addressed.
For averaging, we need to ensue that everyone is using the same scale, otherwise the average is mathematically meaningless (see How many wine-quality scales are there? and How many 100-point wine-quality scales are there?). Similarly, when trying to combine ranks together, there is no generally agreed method for doing so — in fact, different ways of doing it can produce quite inconsistent outcomes (see the literature references below).
Number of first places
For those wines ranked first overall at each tasting, only 4-60% of the scorers had actually chosen them as their personal top-ranked wines of the evening, with an average of 22%. That is, on average, less than one-quarter of the scorers ranked the overall "winning" wine as being at the top of their own personal list. This indicates that rarely was there a clear winner.
Indeed, for only half of the wines was the "winning" wine the one that got the largest number of first places, based on either the sum of ranks or the average points. Indeed, for those wines ranked first overall at each tasting, for only 24 of the 45 tastings was that wine the one that received the greatest number of 1st place votes during the evening. Similarly, for the wines with the highest average score at each tasting, for only 25 of the 45 tastings was that wine the one that received the greatest number of 1st place votes during the evening.
We may safely conclude that neither being ranked 1st by a lot of people, nor getting a high average score from those people, will actually make a wine the top-ranked wine of the evening. As I have noted in a previous blog post, often the winning wine is the least-worst wine
Confusingly, for each tasting, the Vintners Club rank data very rarely add up to the expected total for the number of people providing results. That is, the sum of the ranks should = 78 x the number of people providing scores. A few points less than the expected number likely represents a few tied votes by some of the scorers. However, there are also many tastings where the total scores add up to much more than is possible for the number of people present at the tasting. I have no explanation for this. (And yes, I have considered the effect of alcohol on the human ability to add up numbers!)
Michel Balinski, Rida Laraki (2013) How best to rank wines: majority judgment. In: E. Giraud-Héraud and M.-C. Pichery (editors) Wine Economics: Quantitative Studies and Empirical Applications, pp. 149-172. Palgrave Macmillan.
Jeffrey C. Bodington (2015) Testing a mixture of rank preference models on judges’ scores in Paris and Princeton. Journal of Wine Economics 10:173-189.
Victor Ginsburg, Israël Zang (2012) Shapley ranking of wines. Journal of Wine Economics 7:169-180.
Neal D. Hulkower (2009) The Judgment of Paris according to Borda. Journal of Wine Research 20:171-182.
Neal D. Hulkower (2012) A mathematician meddles with medals. American Association of Wine Economists Working Paper No. 97.