Monday, 26 June 2017

What happened to Decanter when it changed its points scoring scheme

In a previous post (How many wine-quality scales are there?), I noted that at the end of June 2012 Decanter magazine changed from using a 20-point ratings scale to a 100-point scale for its wine reviews (see New Decanter panel tasting system). In order to do this, they had to convert their old scores to the new scores (see How to convert Decanter wine scores and ratings to and from the 100 point scale).

It turns out that there were some unexpected consequences associated with making this change, which means that this change was not as simple as it might seem. I think that this issue has not been appreciated by the wine public, or probably even the people at Decanter, either; and so I will point out some of the consequences here.


We do expect that a 20-point scale and a 100-point scale should be inter-changeable in some simple way, when assessing wine quality. However, there is actually no intrinsic reason why this should be so. Indeed, Wendy Parr, James Green and Geoffrey White (Revue Européenne de Psychologie Appliquée 56:231-238. 2006) actually tested this idea, by asking wine assessors to use both a 20-point scale and a 100-point scale to evaluate the same set of wines. Fortunately, they found no large differences between the use of the two schemes, for the wines they tested.

This makes it quite interesting that when Decanter swapped between its two scoring systems it did seem to change the way it evaluated wines. This fact was discovered by Jean-Marie Cardebat and Emmanuel Paroissien (American Association of Wine Economists Working Paper No. 180), in 2015, when they looked at the scores for the red wines of Bordeaux.

Cardebat & Paroissien looked at how similar the quality scores were for a wide range of critics, and then compared them pairwise using correlation analysis. If all of the scores between any given pair of critics were closely related then their correlation value would be 1, and if they were completely different then the value would be 0; otherwise, the values vary somewhere in between these two extremes. Cardebat & Paroissien provide their results in Table 3 of their publication.

Of interest to us here, Cardebat & Paroissien treated the Decanter scores in two groups, one for the scores before June 2012, which used the old 20-point system, and one for the scores after that date, which used the new 100-point system. We can thus directly compare the Decanter scores to those of the other critics both before and after the change.

I have plotted the correlation values in the graph below. Each point represents the correlation between Decanter and a particular critic  — four of the critics have their point labeled in the graph. The correlation before June 2012 is plotted horizontally, and the correlation after June 2012 is plotted vertically. If there was no change in the correlations at that date, then the points would all lie along the pink line.

Change in relationship to other critics when the scoring system was revised

For two of the critics (Jeff Leve and Jean-Marc Quarin), there was indeed no change at all, exactly as we would expect if the 20-point system and 100-point system are directly inter-changeable. For seven other critics the points are near the line rather than on it (Tim Atkin, Bettane & Desseauve, Jacques Dupont, René Gabriel, Neal Martin, La Revue du Vin de France, Wine Spectator), and this small difference we might expect by random chance (depending, for example, on which wines were included in the dataset).

For the next two critics (Robert Parker, James Suckling), the points seem to be getting a bit too far from the line. At this juncture, it is interesting to note that the majority of the points lie to the right of the line. This indicates that the correlations between Decanter and the other critics were greater before June 2012 than afterwards. That is, Decanter started disagreeing with the other critics to a greater extent after they adopted 100 points than before; and they started disagreeing with Parker and Suckling even more than the others.

However, what happens with the remaining two critics is quite unbelievable. In the case of Jancis Robinson, before June 2012 Decanter agreed quite well with her wine-quality evaluations (correlation = 0.63), although slightly less than for the other critics (range 0.63-0.75). But afterwards, the agreement between Robinson and Decanter plummeted (correlation = 0.36). The situation for Antonio Galloni is the reverse of this — the correlation value went up, instead (from 0.32 to 0.56). In the latter case, this may be an artifact of the data, because only 13 of Galloni's wine evaluations before June 2012 could be compared to those of Decanter (and so the estimate of 0.32 may be subject to great variation).

What has happened here? Barring errors in the data or analyses provided by Cardebat & Paroissien, it seems quite difficult to explain what has happened here. Mind you, I have shown repeatedly that the wine-quality scores provided by Jancis Robinson are usually at variance with those of most other critics (see Poor correlation among critics' quality scores; and How large is between-critic variation in quality scores?), but this particular example does seem to be extreme.

For the Cardebat & Paroissien analyses, both Jancis Robinson and Antonio Galloni have the lowest average correlations with all of the other critics, with 0.46 and 0.45, respectively, compared to a range of 0.58-0.68 for the others. So, in this dataset there is a general disagreement between these two people and the other critics, and also a strong disagreement with each other (correlation = 0.17). It is thus not something that is unique to Decanter, but it is interesting that the situation changed so dramatically when Decanter swapped scoring schemes.

References

Jean-Marie Cardebat, Emmanuel Paroissien (2015) Reducing quality uncertainty for Bordeaux en primeur wines: a uniform wine score. American Association of Wine Economists Working Paper No. 180.

Wendy V. Parr, James A. Green, K. Geoffrey White (2006) Wine judging, context and New Zealand sauvignon blanc. Revue Européenne de Psychologie Appliquée 56:231-238.