Monday, April 15, 2019

How different are regional vintage quality-scores from different sources?

I have written quite a few blog posts about the quality scores given by commentators to individual wines for each vintage. However, there is another type of wine score, as well: the annual scores given for whole wine-making regions.

Indeed, there are plenty of wine sites that are happy to score the different vintages for quite a variety of regions, using all sorts of scoring schemes (5 points, 10 points, 20 points, 100 points, etc). I have no idea how worthwhile such scores might be, although I assume that a single score for a whole country or continent (eg. Australia) is rather useless.

What I have always wondered is whether these scores are all telling me the same thing. That is, do the different sources of the scores agree on vintage quality? If so, then why are there so many different scores? If not, then who agrees with whom? This blog post is designed to provide some answers to these questions.

Vintage scores are erratically available, both as to which wine regions have scores and which vintages have scores. Europe has very standardized vineyard areas, and so most of the available scores apply to one or more of those regions. So, I have collated data from every scoring scheme I could find, for every region in Europe.

However, for the simple purposes here, I will focus on the most well-studied wine types of France only:
  • Red Burgundy
  • White Burgundy
  • Red Bordeaux
  • Sauternes
  • Red Rhone
Most of the other French regions (eg. Loire, Alsace) have data from only a few of the scoring schemes.

I will also focus on those 9 of the schemes that have the most complete data:
Finally, I have included the data from the years 1990—2015 inclusive (ie. 26 vintages). The data are less complete among the schemes for other years.


Since there are 9 scoring schemes, 5 regions and 26 years, there is no simple way to construct a picture of the dataset, or even three separate pictures. So I have used the form of multivariate data summary described in my post Summarizing multi-dimensional wine data as graphs, Part 2: networks. The details of the analysis are listed at the bottom of this post. The main thing is to understand how to interpret the network pictures.

This first network shows the 9 scoring schemes. Each scheme is represented by a dot in the network. Schemes that are closely connected in the network are similar to each other based on their vintage scores across all of the 5 regions and 26 vintages, and those that are further apart are progressively more different from each other.

Comparison of 9 scoring schemes for wine vintage quality in France

So, the first thing we can notice is that none of the schemes are in very close agreement with each other, as each has its own long terminal edge. However, the network shows that the Wine Enthusiast, Wine Spectator and Vinous schemes are all rather similar to each other, followed by the Wine Advocate. This is reassuring — the vintage quality scores are apparently not arbitrary! In a more practical sense, it probably does not matter much which of these schemes we choose to consult.

However, the other 5 scoring schemes become progressively more different as we move down towards the bottom of the network. Only pairs of schemes seem to share some features — Hugh Johnson + Wine Society, Wine Society + Berry Brothers, Berry Brothers + IG Wines. The Cavus Vinifera scheme has complex relationships to the 4 schemes at the top and to the 4 schemes at the bottom.

Next, we can look at the 5 wine types. As above, each vineyard region is represented by a dot in the network; and regions that are closely connected in the network are similar to each other based on their vintage scores across the 9 schemes and 26 vintages, while those that are further apart are progressively more different from each other.

Comparison of vintage quality for 5 French wine regions

We can, once again, note that the 5 wine types are not especially similar to each other in terms of which years are "good" and which not. However, we can see that the two white wines are loosely grouped at the left of the network and the 3 red at the right; and the 2 Burgundy wines are loosely grouped at the bottom. So, there are actually some consistent patterns across the 26 vintages. I am not sure whether the grouping of the Rhone wines with the Bordeaux wines is of any practical significance.

Finally, we can look at what the original scorers might think is most important: the vintages themselves. As above, each vintage is represented by a dot in the network; and vintages that are closely connected in the network are similar to each other based on their vintage scores across the 9 schemes and 5 wine types, while those that are further apart are progressively more different from each other. So, we are looking at a general summary of French vintages, rather than any particular vineyard region.

Comparison of quality for 26 French wine vintages

Those 5 vintages that all of the schemes rated most highly are grouped at the top of the network, and the 3 worst ones are at the bottom. Note that the 1990 vintage, which was a good one across all of western Europe, not just France, was immediately followed by the 4 worst vintages.

There is some disagreement about the other vintages, as 6 of them are arranged at the left of the network, and the remainder at the right, although they still generally run from top to bottom as better to worse. This variation is due to which scoring schemes preferred which vintages. For example, the 2002 vintage actually received scores that ranged from 5 to 9, even for the same region, which did not happen for the vintages at the top of the network, which all got 8–10.


There is a lot of similarity among the scoring schemes for regional vintage quality, but not as much as we might like. Presumably, personal preferences are playing a large part in the scores assigned. Indeed, the scoring differences between schemes are often larger than are the differences between regions.


The various scoring schemes were first standardized to the same scale of 10 points:
   10-point scale: left alone
   20-point scale: 10 = 20 / 2
   100-point scale: 10 = (100 - 50) / 5
Of the 1,170 observations, 15 were missing (1.3%). These were interpolated using the minimum-variance method, based on row and column averages.

Three datasets were then constructed, one with the Schemes as the objects, one with the Regions as the objects, and one with the Years as the objects.

For each dataset, the pairwise distances between the objects were calculated using the Manhattan metric. A Neighbor-Net was then calculated based on these distances, using the SplitsTree program.

No comments:

Post a Comment