Monday, August 7, 2017

The blind leading the blind?

Recently, The Economist magazine tried to champion the cause of blind tastings by using the results of the 2017 wine-tasting contest between Oxford and Cambridge universities (Think wine connoisseurship is nonsense? Blind-tasting data suggest otherwise). The conclusion was that the tasters "performed far better than random chance would indicate." However, very little data analysis was performed, and so a look at their data is in order.


The Economist notes:
The main results of the 2017 Varsity blind-tasting match, held on February 15th, are depicted above. Two teams of seven tasters each (including one reserve per side) were presented with 12 wines, six whites and six reds. The judges granted each taster between zero and 20 points per wine, depending on how close (in their estimation) the drinkers’ guesses were to the correct answers, and how convincingly they explained their reasoning. However, we prefer a simpler scoring system: one point for getting the country of origin right, another point for getting the grape variety right and a judicious half-point of partial credit only in a handful of specific cases.
The group’s overall accuracy was far superior to what could be expected from random chance. Given the thousands of potential country-variety pairs, a monkey throwing darts would have virtually no hope of getting a single one right. But 47% of the Oxbridge tasters' guesses on grape variety were correct, as were 37% on country of origin.
The Economist does point out the rather obvious variation in success, among both the tasters and the wines — some tasters did much better than others, and some wines were identified much more commonly than others. However, a variance-components analysis of the data indicates that it is the variation among the wines that dominates the dataset — for the successful identification of grape variety, 90% of the variability is due to the variation among the wines and only 5% is due to the variation among the tasters; and for the identification of country of origin, it is 65% and 25%, respectively.

So, any general comments about the success of blind wine-tasting must be tempered by the fact that some wines are apparently much easier to identify (by grape or country) than are others.

Statistical evaluation

The Economist's assessment of the probability of success is based on a mathematically naïve set of assumptions. As an example of their "dart-throwing" calculation: there are c. 100 common red-grape varieties, and so there is a 1% chance of me getting one right at a blind tasting by simply guessing. I would then have a 6% chance of getting at least one wine right if I simply guess the same red grape each time, for the six wines. This makes the 47% success rate of the tasters look pretty good.

However, this calculation is mathematically naïve because human beings are not monkeys, with or without darts. Some grape varieties occur in wines much more commonly than do others, and those grapes are more likely to be represented in the tasting contest; and human beings know this, even if the monkeys do not. Similarly, some countries are more likely to be represented in a wine tasting than are others, especially given the presence of certain grape varieties. For example, how many Gamay wines are made outside of France? If I simply assume "Beaujolais" for a Gamay wine then I have a 95% chance of being right!

We therefore cannot assume that an educated wine taster is the same as a monkey throwing darts. The wine taster is not guessing, any more than a motor mechanic is guessing when diagnosing a fault in your car. They both have prior knowledge, which even at worst produces an educated guess (and at best is professional expertise). That is, an "educated guess" should be the basis of our statistical comparison, not a "random guess", as done by The Economist.

So, in order to work out the actual probabilities of success for each grape (and country) I need to know the probability of one of the wines in the contest being, say, Chardonnay. That is, I would need to know the probability of the competition organizers choosing each of the grape varieties and countries for the tasting. Sadly, I do not have this information.

As a realistic substitute, I will use how common the different varieties/countries are in liquor stores. That is, I will assume that the bottles have chosen from the selection available in the shops.

For this, I will use the wine database of the Systembolaget liquor chain, in Sweden. I have used this database before (eg. How many wine prices are there?) because, being the third largest liquor chain in the world, it's selection of wines is extensive. Furthermore, being a European chain, it is likely to match the British organizers' probabilities of choice better than would many other sources. Indeed, for both the red and the white varieties, the organizers chose 4 of the 5 most common grapes in the Systembolaget database (out of the 6 chosen). So, my probabilities may be pretty good, at least from the point of view of the participants working out which wines they are likely to encounter in the tasting.

As an example, 25% of the white wines in Systembolaget's database have Chardonnay listed as a principal grape variety. This means that we would expect an 82% chance of at least one of the 6 white wines being Chardonnay. The participants actually had an 86% success rate at identifying the Chardonnay. So, my analysis suggests that in this one case they have not actually done any better than they could have done by taking an educated guess based simply on how common the wines are in the shops. The question they are answering in the tasting is not "is this a Chardonnay?" but "which one is the Chardonnay?"!

Statistical results

So, my basis for estimating the prior probabilities of expected success for the participants is to work out the probability of at least one of the wines being of that variety or region (based on its frequency in the Systembolaget database). We can then compare this to the tasting results for each grape variety and each country, to see if the participants actually did better than an educated guess.

For each of the graphs presented below, the interpretation is as follows. Each variety or country is represented by a horizontal line, as indicated by the legend. The central point on each of the lines represents the percentage of the tasters who succeeded at the task for that wine. The two end points on each line are the boundaries of the estimated 95% confidence interval (formally: the Score binomial 95% confidence interval). This interval gets smaller as the sample size (the number of tasters) gets larger, as it represents our statistical "confidence" in the results. The asterisk represents the expected results if the tasters are performing in accordance with the estimated prior probabilities. So, if the asterisk is within the 95% confidence interval for a particular wine, then the tasters have done no better than an educated guess for that wine, whereas if the asterisk lies outside the 95% confidence interval then the tasters have done better (or worse) than expected.

Expected versus actual correctness for grape varieties

Expected versus actual correctness for countries

The analyses indicate that in only 2 out of 12 cases did the participants identify the grape variety with any more success than would be expected based on the commonness of the wines: the Pinot Noir and the Gamay. Otherwise, they did as well as we would expect using an educated guess — except in the case of the Riesling wine, where they did rather poorly. In this case, Riesling is apparently a more common wine grape than the participants realize!

The analyses also indicate that the tasters did both better and worse than expected with the identification of country of origin. In three cases they did better than expected (France and New Zealand for the red wines, and Australia for the white wines), and in three cases they did worse than expected (Spain for the red wines, and France and Italy for the white wines). That is, French white wine is apparently a more common type than the participants realize, as also are Italian white wine and Spanish red wine.

Conclusion

I have indicated before that blind tastings are notoriously hard (see Can non-experts distinguish anything about wine?). The results and analyses presented here confirm that conclusion — for some wines the participants did very well, but in most cases they could have done just as well by guessing based on how commonly the wines are encountered. The Economist's optimism in this case is misplaced, due to a naive assessment of the prior probabilities of success.

2 comments:

  1. "Some grape varieties occur in wines much more commonly than do others, and those grapes are more likely to be represented in the tasting contest; and human beings know this, even if the monkeys do not. Similarly, some countries are more likely to be represented in a wine tasting than are others, especially given the presence of certain grape varieties. ...

    "We therefore cannot assume that an educated wine taster is the same as a monkey throwing darts. The wine taster is not guessing, any more than a motor mechanic is guessing when diagnosing a fault in your car. They both have prior knowledge, which even at worst produces an educated guess (and at best is professional expertise). ..."

    I guess we Californians should feel snubbed by the organizers of the Oxford and Cambridge competition. Not a single submission white or red from our fair state.

    A crafty student participant could "game the system" by reviewing Jennifer Segal's book titled “Reds, Whites and Varsity Blues” to learn the universe of grape varieties submitted in the past. Then through repeated exposure commit to organoleptic memory the aromas, bouquets, flavors, body weight and comparative tannin levels of the "usual suspect" wines.

    (Unless the history of the event chronicled in Jennifer Segal's book disproves, I suspect -- and so should have the student participants -- that no one grape variety is ever repeated in the white or red flights. So make six different grape variety guesses.)

    ReplyDelete
    Replies
    1. There is no need to feel snubbed. If the organizers chose their bottles at random in Europe, the chances that even one of them would be from anywhere in the USA is quite small. For example, selecting from the Systembolaget database yields only 34% for the reds and 24% for the whites.

      Delete