Monday, 5 December 2016

Are there biases in community wine-quality scores?

In an earlier blog post I considered Biases in wine quality scores. This issue had been previously pointed out by Alex Hunt (What's in a number? Part the second), who presented graphs of the data from various well-known wine commentators showing that the scores they give wines have certain biases. In particular, some scores are consistently over-represented, such as 90 compared to 89 (using the 100-point scale) and 100 compared to 99. In my post, I performed a detailed analysis of a particular dataset (from the Wine Spectator), showing how we could mathematically estimate the extent of this bias.

This point immediately raises the question as to whether these biases also occur in community-based wine scores, or whether they are restricted to scores that are compiled from individual people. Community scores are compiled from the users of sites such as Cellar Tracker, where they simply pool all of their individual scores for each wine. It is possible that, even if each individual scorer shows biases towards certain scores, these biases might average out across all of the scorers, and thus be undetectable. Alternatively, the biases might be widespread, and thus still be evident even in the pooled data.

To find out, I downloaded the publicly available scores from Cellar Tracker for eight wines (for my data, only 55-75% of the scores were available as community scores, with the rest not being shared by the users). These eight wines included red wines from several different regions, a sweet white, a still white, a sparkling wine, and a fortified wine. In each case I searched for a wine with at least 300 community scores; but I did not succeed for the still white wine, and in that case the best I could find had only 189 scores.

Below, I have plotted the frequency distribution of the Cellar Tracker scores for each of the eight wines. As in my previous post on this topic, the height of each vertical bar in a graph represents the proportion of wines receiving the score indicated on the horizontal axis.

As you can see, these graphs do show distinct biases, although some of the graphs are much less biased than are others.

The principal bias, when it occurs, is most commonly an over-representation of scores in units of five: 70, 75, 80, 85, 90, and 95. In particular, the first five graphs show this pattern to one extent or another. So, it seems that a lot of the users are actually scoring their wines on a 20-point scale, and then simply multiplying them by 5 to get to the 100-point scale required by Cellar Tracker.

The final three graphs show an over-representation of the score 88, compared to scores of 87 and 89 (and the first graph also has this pattern). This seems to be a manifestation of the same bias shown by the professional commentators, in which a score of 90 occurs more commonly than 89. That is, a score of 88 is used to indicate a wine that is well-liked, while a score of 90 represents a wine that is great, thus leaving a score of 89 in limbo.

Finally, the last graph shows an under-representation of the score 96, compared to scores of 95 and 97. There seems to be no obvious reason for this.

1:
2:
3:
4:
5:
6:
7:

8: