Monday, January 16, 2017

What's all this fuss about red versus white wine quality scores?

A few months ago, Suneal Chaudhary and Jeff Siegel released a report entitled Expert Scores and Red Wine Bias: a Visual Exploration of a Large Dataset. This was first announced on Siegel's Wine Curmudgeon site, and subsequently received a fair amount of internet comment. Chaudhary and Siegel's bottom line is that "Red wines, in our large data set, are more frequently scored higher than whites", when referring to wine-quality scores published by professional commentators.

Chaudhary and Siegel's frequency histogram of their data

However, little of the web comment has been related to the actual report. As a scientist, I think that the work done is as least as interesting as the conclusions reached; and that is what I will comment on here. Indeed, I will not disagree with the conclusions, but I will disagree with some of the work.

Let's start by addressing the initial research question: "Do experts rate red wines more highly than white wines, regardless of price, vintage, and region?" A problem here is that we already know the answer to this question. As pointed out by the indefatigable Bob Henry, Robert M. Parker Jr. was interviewed in 1989 for the Wine Times magazine (later renamed the Wine Enthusiast), and described his now infamous 50-point wine-rating system:
Parker: It’s a fairly methodical system. The wine gets up to 5 points on color, up to 15 on bouquet and aroma, and up to 20 points on flavor, harmony and length. And that gets you 40 points right there. And then the [balance of] 10 points are ... simply awarded to wines that have the ability to improve in the bottle.
Times: Do you have a bias toward red wines? Why aren’t white wines getting as many scores in the upper 90s? Is it you or is it the wine?
Parker: Because of that 10-point cushion. Points are assigned to the overall quality but also to the potential period of time that wine can provide pleasure. And white Burgundies today have a lifespan of, at most, a decade with rare exceptions. Most top red wines can last 15 years and most top Bordeaux can last 20, 25 years. It’s a sign of the system that a great 1985 Morgon [cru Beaujolais] is not going to get 100 points because it’s not fair to the reader to equate a Beaujolais with a 1982 Mouton-Rothschild. You only have three or four years to drink the Beaujolais.
Fred Swan provides a detailed elaboration of this point, noting that it is a general feature of many scoring systems.

So, Chaudhary and Siegel's published work is an unfortunate example of what philosophers call "confirming the consequent" — demonstrating what we already know to be true (see Wikipedia). We don't need a data analysis of quality scores for 61,809 wines — we can take Parker's word for it. Many red wines age longer than most whites, and so there will be a definite measurable bias in scores, which will apply irrespective of price, vintage, and region. Chaudhary and Siegel have simply demonstrated that the critics are being as good as their word.

This makes it clear what question we should actually be asking: "Do wines that are expected to live longer get higher ratings by experts than do other wines, irrespective of wine type?" This is a somewhat different question to the above, because it asks whether long-lived red wines get better scores than shorter-lived red wines, and the same for white wines. Chaudhary and Siegel sub-divided their dataset in several different ways, but they did not sub-divide it based on longevity, and so they do not answer this question. However, I presume that they could do so, with a bit more data analysis.

Answering the question is straightforward in principle, although it may require a bit of work, and possibly some argument about the longevity of each wine type, because we will need to sub-divide the data by wine type and region. Many wine-making regions have both long-lived wines and drink-now wines, of both white and red types.

There are certainly many long-lived white wines, especially among the sweet wines (as also suggested by John Joseph), such as trockenbeerenauslese (especially), beerenauslese, auslese, eiswein / icewine, and even spätlese wines, plus Sauternes / Barsac, sélection de grains nobles, vendange tardive and Tokaji. We should also include chenin blanc wines from the Loire valley, and Hermitage wines from the Rhône. And that is just western Europe.

More to the point, there's an incredible amount of variation in longevity amongst red wines. At one end we have the fine wines of western Europe, such as Bordeaux, Barolo / Barbaresco, Brunello di Montalcino, Rioja, Ribera del Duero, Priorato, Hermitage, Chateauneuf-du-Pape, Cote Rôtie, and so on. And there are plenty from elsewhere in the world, as well, all of which should get high quality scores, according to Parker. At the other extreme, we have classics like non-cru Beaujolais (mentioned above), and those from Blaye and Tavel, as well as the dolcetto wines from Piemonte, all of which should score much lower. And in between we have more wines than you could care to name, which should get intermediate scores, on average.

To illustrate what I mean, here is an analysis of a dataset I happen to have at hand. It comes from Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos and José Reis (2009. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47: 547-533), and pertains to 4,898 whites and 1,599 reds from the Minho region of north-western Portugal. These vinho verde wines are not usually considered to be long-lived, and so they should be directly comparable under our experimental question (same region, same longevity, different colors). According to the authors, the wines were quality scored as follows:
Each sample was evaluated by a minimum of three sensory assessors (using blind tastes), [who] graded the wine on a scale that ranges from 0 (very bad) to 10 (excellent). The final sensory score is given by the median of these evaluations.

The resulting frequency histograms do not show any bias in score between reds and whites, exactly as we would expect — if anything, the whites do slightly better than the reds. [Note the log scale on the vertical axis.] So, what we need to do now is the same sort of thing, with all the rest of the wines in the Chaudhary and Siegel dataset.

Moving on to another issue, Chaudhary and Siegel consider several types of potential bias in their data. For example, their data are based on published scores, and the wine media are notorious for not bothering to publish low quality ratings. This creates what scientists call "publication bias", which means that the reds will have more scores published than will the whites, because their scores are often higher. The size of the dataset cannot help deal with this, as Chaudhary and Siegel claim, because dataset size deals only with stochastic (random) variation, not variation due to bias. This is a classic failing of many datasets in science, as it also is here.

This means that the quality scores might not actually represent what people drink, and therefore what is in the wine shops. For example, in the dataset of Chaudhary and Siegel 24% of the scores are for white wines and 76% are for reds, a ratio of 3:1 in favor of the reds. A quick search of the 9,411 wines in my local liquor chain, Systembolaget (reputed to be the third biggest alcohol chain in the world), reveals that they stock 37% white wines and 63% reds, a ratio of less than 2:1 for the reds. So, the Chaudhary and Siegel dataset probably cannot claim to be representative of the wine industry as a whole, only that part where scores actually get published. This is a pity.

Finally, it is worth pointing out the massive bias that exists in the wine scores, as shown in the graph at the top of this post. For both white and red wines, a score of 90 points is massively over-represented compared to a score of 89 points. This is an embarrassment for the profession of wine commentary, as it gives the lie to any pretense that wine quality scores are objective. This is discussed in more detail in my post on Biases in wine quality scores.

If you are interested, Fred Swan also has a much longer list of queries about Chaudhary and Siegel's report.

No comments:

Post a Comment