Monday, January 14, 2019

The fundamental problem with wine scores

I have written several blog posts about wine-quality scores, pointing out that even though they are expressed as numbers they do not have many useful mathematical properties; and, to me, a score with no mathematical meaning is like trying to construct a Swedish sentence by knowing the words but not the grammar. However, what I have not done, until now, is point out the fundamental issue that leads to this situation in the first place. That is, I have previously pointed out effects, but not causes.

Before proceeding to discuss the cause, however, I will point out that many wine commentators seem to treat wine scores as nothing more than a convenient way to express their own personal preferences (ie. increasing score indicates increasing preference). Under these circumstances the scores have nothing to do with mathematics, at all. Preferences could just as easily be expressed with words; and in this case they probably should be. They certainly used to be, before the 1990s, and for some commentators they still are.

The basic issue

Put formally, wine scores represent multidimensional properties that have been summarized as a single point in one dimension.

Sounds good, doesn't it? Let's put it another way: the single wine-quality number is trying to do too many things all at once.

Whenever a critic tells us how they construct their scoring scheme, they usually list a series of characteristics of wines that purportedly contribute to quality (mainly based on color, aroma, palate and body). Formally, each of these characteristics is a "dimension" of any given wine's quality.

Here is an example, taken from Steve Charters and Simone Pettigrew (2007. The dimensions of wine quality. Food Quality and Preference 18: 997-1007).

The dimensions of wine quality

In terms of quality, most commentators are interested solely in the intrinsic dimensions. However, in order to describe a wine mathematically, we would need a number for each of these intrinsic dimensions. Given this collection of numbers, we would then have a complete description of any given wine's quality.

The situation

As a prime example, take the original UCDavis wine scoring system, which covers the score range 0-20.** The characteristics of quality and their associated numbers are:
Aroma & bouquet
Volatile acidity
Total acidity
General quality

There are 11 dimensions here, and we need all 11 numbers to completely describe any given wine's quality. That is, wine quality is multi-dimensional, and we need to "see" all of those dimensions in order to evaluate the wine.

However, rather than doing this, the UCDavis system summarizes the wine down to a single number — in this case, we add the numbers for each dimension, to get a score out of 20. That is, we reduce the multi-dimensional idea of quality down to a single point in one dimension — that dimension simply goes from 0 to 20, and the point on that dimension is the quality score.

The ensuing problem

The problem that arises from this situation actually applies any time we reduce a multi-dimensional concept down to a single dimension. I encountered this issue many times in my professional life as an environmental and evolutionary biologist,* so there is nothing unique about the situation as it arises in wine commentary.

The problem is this: many quite-different wines could end up with the same final score. Summarizing a set of numbers down to a single number must, by definition, lose most of the numerical information (the multiple dimensions become one dimension only). If a wine gets a score of 0, then we know the score for each dimension (it must be 0 in each case), and we have lost no information. The same applies for a wine that scores 20, as this must mean that the wine got the maximum score for each dimension. But for all other scores the situation is ambiguous.

Consider these two wines, which I have described using the 11 UCDavis dimensions listed above:
2 + 2 + 2 + 2 + 2 + 0 + 1 + 1 + 2 + 1 = 15
2 + 2 + 4 + 1 + 1 + 1 + 1 + 2 + 0 + 1 = 15
These would be two very different wines; but I would never know it from the final quality score.

So, you should now see why wine quality scores have a fundamental problem, if we try to treat them as mathematical concepts: how do we interpret the quality score? We have no way of knowing what the score represents in terms of the multi-dimensional concept of wine quality. Two identical scores could easily represent two very different wines.

A problem for all ratings systems

The problem discussed here is general. All ratings systems are one-dimensional, while the data on which they are based are multi-dimensional. A linear rating system makes no sense when you are combining different characteristics — we cannot combine multiple features into a single number in any way that makes much sense. That is, when we look at the final rating score we cannot tell which characteristics were important in producing it.

Take this simple situation, where value for money has two dimensions, quality and price:
A (high quality) a (expensive)
A (high quality) b (inexpensive)
B (low quality)  a (expensive)
B (low quality)  b (inexpensive)
How could I sensibly put these four groups in a single order based on value for money? We know which group is likely to be the best value for money, and we might put this at the top; and we know which is the worst value for money (Ab), and we might put this at the bottom (Ba); but what do we do with Aa and Bb in terms of value for money? If we did put them in some order, we would be doing so solely for the sake of doing so, not because it would be informative.

We have two totally different criteria, and combining them vitiates any attempt at a single order. The only system that would make sense would be multi-dimensional. That is, we should keep the ratings as Aa, Ab, Ba and Bb — the categories would this have meaning even though their order does not.

This is very similar to America's Got Talent, where the judges are trying to compare a magician with a pole dancer, and deciding which is "better". Better at what? Both of them are very good with their hands, but in very different ways! No wonder most of these shows worldwide end up being won by singers.

Wine shows

So, the issue for wine-quality ratings should now be clear. The ratings are based on trying to combine a series of different characteristics, some of which are very different from each other.

This explains why a wine can win a gold medal at one show and nothing at all at the next. The judges were combining the different quality dimensions in different ways, and thereby deciding which is best — that is all that the wine shows tell us.

The wine shows try to alleviate the problem a bit, by having a lot of different categories, based on all sorts of features (grape variety, wine style, vintage age, etc). This certainly helps, but it brings us back to the same problem of comparing two bottles of wine based on a series of vinous characteristics that are very hard to combine into a single number. And this approach certainly does not help at all with "best wine in show" awards.

A solution?

I have discussed multi-dimensional data previously in this blog. I pointed out at the time that, if we are going to take the numbers seriously, then we actually need to draw graphs of them, not reduce them to a single number:
Summarizing multi-dimensional wine data as graphs, Part 1: ordinations
Summarizing multi-dimensional wine data as graphs, Part 2: networks
It is difficult seeing the wine-buying public going for this solution, but I might discuss it in a future post.

An alternative solution?

It has sometimes been claimed that a wine score is not a number, but is more like an adjective. Well, it sure looks like a number to me, so this simply exacerbates the problem. If it is an adjective then it should be a word, not a number. I will discuss this in my next post, but as a preview: it still takes multiple words to describe all aspects of a wine's quality, and summarizing this in a word or two does not change anything — we are still summarizing multiple dimensions (expressed as words, this time) into one dimension (a small set of words).

* For example, in ecology Species Diversity is measured as a combination of two dimensions: (1) a count of the number of species, and (2) the abundance of each species. These two concepts are combined into a single number.

** Here is a more detailed overview of the UCDavis scoring scheme, taken from George Vierra (A better wine scorecard?).