Monday, February 13, 2017

Poor correlation among critics' quality scores

One reason for reading the wine literature is supposed to be that we get advice from experts about the relative qualities of different wines. From this advice we might be able to make an informed decision about which wines we might fork out our hard-earned cash to purchase.

However, we are often recommended to find an expert whose wine tastes match our own, before we start reading this advice. The reason for picking a single expert becomes obvious when we compare the quality scores from different critics — they frequently seem to have little in common with each other.

To examine this, we could make a direct comparison of the quality scores from well-known sources of advice, such as the Wine Spectator, the Wine Advocate, the Wine Enthusiast, Wine & Spirits Magazine, and Jancis Robinson, along with some who may be less familiar to you.


To illustrate the point, we need an example wine. Most critics rate only a few vintages of any given wine, so that most comparisons would be uninformative. This means that any comparison will be restricted to some wine that is popular among the commentators.

The one I have chosen is Penfolds Grange Bin 95, known as "Grange Hermitage" when I was young. This is possibly Australia's best known red wine among connoisseurs, famous for its longevity. The 1952 vintage is usually regarded as the first commercial release, and so we have a nice long series of vintages for which to compare the quality scores of the various professional commentators. The current release is from the 2012 vintage, making a total of 61 years.

There are very few wines that have such a long set of vintages for which quite a number of commentators have provided quality scores (most of the rest come from Bordeaux). In this case, it is because Penfolds occasionally organizes thorough retrospectives of this wine, to which the critics are invited. There are some notes on Grange at the end of this post, for those of you who are not familiar with it.

Data comparison

If we take the vintages from 1952 to 2011 inclusive, then there are five commentators whose quality scores we can directly compare across these 60 vintages: Jeremy Oliver, Huon Hooke, and the Wine Front, all from Australia, and the Wine Spectator magazine and the Wine Advocate newsletter, both from the USA. Almost all of the critics discussed in this blog post use a 100-point quality scale.

This first graph illustrates these five sets of scores.


If this looks like a mess to you, then it is because this is a mess. There is clearly very little consensus among these scores, regarding which vintages are the better ones and which are not.

We can quantify the relationships among the scores using correlation analysis. This reveals that the following percentages are held in common between these four critics pairwise:

Jeremy Oliver
Wine Front
Huon Hooke
Wine Spectator
Wine Advocate
 

42%
24%
19%
9%
Oliver


40%
23%
19%
Front



38%
26%
Hooke




35%
Spectator





Advocate

These values are very low. Indeed, no pair of critics agree on even 50% of the variation in their scores. That is, the critics disagree with each other more than they agree. This is hopeless!

If we restrict the dataset to the period 1990 to 2011 inclusive, then we can add James Halliday, Australia's best known wine commentator, as another source of quality scores. The second graph illustrates the six sets of scores for these 22 vintages.


The correlation analysis then reveals the following percentages held in common between these six critics pairwise:

Jeremy Oliver
Wine Front
Huon Hooke
Wine Spectator
Wine Advocate
James Halliday
 

29%
20%
20%
12%
18%
Oliver


44%
36%
38%
34%
Front



26%
33%
24%
Hooke




23%
37%
Spectator





42%
Advocate






Halliday

Halliday's scores vary hardly at all, so nothing much changes. The largest amount of agreement is still only 44% — when we add a critic they still don't agree with any of the previous ones!

Next, if we restrict the data to the 1995-2010 vintages, then we can add Wine & Spirits Magazine, the Wine Enthusiast and Stephen Tanzer, all from the USA. I haven't shown the graph of the the scores for these 16 vintages; but the correlation analysis reveals the following percentages held in common between these nine critics pairwise:

Oliver
Front
Hooke
Spectator
Advocate
Halliday
WineSpirits
Enthusiast
Tanzer
 

29%
26%
19%
3%
2%
1%
32%
24%
Oliver


76%
36%
40%
44%
3%
52%
41%
Front



57%
47%
46%
4%
66%
30%
Hooke




33%
40%
9%
61%
47%
Spectator





51%
5%
42%
48%
Advocate






8%
44%
23%
Halliday







8%
2%
WineSpirits








46%
Enthusiast









Tanzer

As you can see, for this restricted data set of 16 vintages we do finally get more than 50% concordance. Indeed, Huon Hooke, the Wine Front, the Wine Spectator and the Wine Enthusiast are in reasonable agreement with each other for these vintages, with Huon Hooke and the Wine Front actually having 76% agreement over these few vintages. However, the average agreement is still only 32% among the nine critics; and Jeremy Oliver has only 1% concordance with Wine & Spirits Magazine!

The discrepancies among the critics become particularly obvious when we consider the details, such as the controversial vintage of 2000. The scores for this Grange wine are:
Huon Hooke
Jeremy Oliver
Wine Front
Wine Spectator
Stephen Tanzer
Wine Enthusiast
Wine Advocate
Wine & Spirits Magazine
Falstaff Magazin
James Halliday
86
87
88
89
89
90
93
93
94
96
James Halliday and the Wine Advocate actually rated the vintage as better than the 1999, while Wine & Spirits Magazine and Falstaff Magazin (from Austria) rated them equal; the other commentators all rated the 2000 as significantly worse than the 1999.

Finally, you may have been wondering what happened to the quality scores from Jancis Robinson, of the UK. There are two issues to be addressed here: she uses a 20-point scale instead of 100; and her scores are scattered across the vintages rather than being concentrated in a single set of consecutive vintages. Nevertheless, there are scores for 37 vintages, and we can compare them to the first five critics discussed above.

I haven't graphed the scores for these vintages, either; but the correlation analysis reveals the following percentages held in common with the other critics:
Jeremy Oliver
Wine Front
Huon Hooke
Wine Spectator
Wine Advocate
2%
0%
0%
7%
1%
Robinson uses only six scores for Grange vintages (16.5-19), which affects the estimates of the correlations. However, you can see that her scores are literally in a world of their own — there is less than 10% agreement with any of the other five commentators; and only the Wine Spectator has a correlation that is any better than random with respect to Robinson's scores.

Conclusion

The idea that wine commentators have some sort of consensus opinion with regard to wine quality is completely untenable in this example. In general, the agreement varies from 0-50%, so that the critics disagree more than they agree. Certainly, in this case, you have to carefully pick your advisor first, before deciding on which are the high-quality vintages. The wine itself is the least important component of wine quality for Penfolds Grange.

Grange notes

The ascent of Grange to its status as Australia's premier wine was slow and steady. The 1951 vintage was an experimental wine, with the 1952 vintage usually regarded as the first commercial release. This was released in 1956 — these days the wine is generally not released until it is 6 years old. The wine is principally shiraz (syrah), always blended from various sources.

The 1955 vintage was entered in the Royal Agriculture Society show in Sydney in 1962, where it won the wine's first gold medal. Internationally, the 1971 vintage then topped the Gault-Millau Wine Olympiad in Paris in 1979, beating some of the best Rhône wines.

The 1976 was the first Australian wine to pass $20 per bottle (released 1981). Hefty price increases occurred for the 1982 to 1989 vintages; and in 1987 the 1982 was released for more than $50. The 1990 vintage was released with a further big price increase; and this is when the “Hermitage” name was dropped.

In 1995, Wine Spectator magazine named the 1990 Penfolds Grange as its wine of the year, for the first time choosing a wine produced outside California or France. In the same year, Robert Parker (Wine Advocate #100, August 1995) proclaimed Grange as “the leading candidate for the richest, most concentrated dry red table wine on planet earth.” International market perceptions immediately changed, and export markets began to take allocations.

The wine is now regularly traded at auctions around the world, and its prices are followed in the same way as Bordeaux’s first growth chateaux and Burgundy’s grand crus. The release price of Grange has a huge effect on the value of other ultra-fine Australian wines. Like all such wines, mature bottles of older vintages can always be found for less money than the (not yet drinkable) current release.

No comments:

Post a Comment