Monday, 27 February 2017

Three centuries of Rheingau vintages — Schloss Johannisberg

The central German vineyard region of the Rheingau has a long history of active interest in vintage quality, as I have already discussed in the blog post The grand-daddy of all vintage charts. Individual vineyards within this region also have long records regarding their vintages, notably the three centuries of recording for Schloss Johannisberg. Finding this information online is not easy, and so I will be covering it in this blog post.

This post follows my previous ones on individual producers, including century-long records from Piemonte, in northern Italy. for Fontanafredda; and Marchesi di Barolo.

Schloss Johannisberg is formally known as Fürst von Metternich-Winneburg'sche Domäne Schloss Johannisberg. You can read all about the estate and winery at the Johannisberg web site (in both English and German).


The vintage data discussed here are taken from this book:
Josef Staab, Hans Reinhard Seeliger, Wolfgang Schleicher (2001) Schloss Johannisberg: Nine Centuries of Wine and Culture on the Rhine. Woschek-Verlag, Mainz.
It is written in both German and English [German title: Schloss Johannisberg: Neun Jahrhunderte Weinkultur am Rhein]. The data (pp. 119-128) cover the vintages from 1700 to 2000 inclusive, and were compiled by Dr. h.c. Josef Staab.

For almost every vintage, the data consist of wine quantity, in hectoliters, and a brief verbal description of quality. Unfortunately, the verbal descriptions vary greatly across the three centuries, and so they are not directly comparable. I have therefore standardized them into a semi-quantitative score as follows:

Score 0:  Acetic, frost [vintage entirely lost]
Score 1:  Not drinkable, very poor, very sour, extremely poor
Score 2:  Lesser wine, lesser year, poor, low quality and poor, unenjoyable, drinkable, sour
Score 3:  Mediocre, average, lesser to average, lesser to mediocre, modest
Score 4:  Good, good wine, good to very good, quite good, average to good, good average wine
Score 5:  Very good, extra good, particularly good, especially good, very good top wine, excellent
Score 6:  Top wine, trophy wine, first rate top wine, excellent top wine

Here is a summary of the harvest-quality data presented as a frequency histogram of increasing quality. For random data his would follow what is known as a binomial probability distribution. The graph approximately does so, but it is slightly over-dispersed for a perfect fit (ie. not enough scores of 3, and too many 1, 5, 6).

Frequency distribution of quality scores from Schloss Joannisberg

In the next graph I have shown the harvest-quality data as a time series. Each data point represents one vintage, and the pink line is a running average (it shows the average value across groups of 9 consecutive years, thus smoothing out the long-term trends). [Technical note: the data are of ordinal type but not necessarily interval type, and so calculating an average may not actually be valid. I have simply assumed that it is appropriate, given the relatively close fit to the binomial probability distribution.]

Time seies of quality scores from Schloss Joannisberg

Using the scale 0-6, the average vintage score is 3.2, whereas it would be c.3 for random data, so that the average harvest across the 301 years was slightly above expectation. There is no general long-term trend in vintage quality across these three centuries, as was also true for the Rheingau region in general (see The grand-daddy of all vintage charts). Nevertheless, Scores 1 and 2 do decrease in frequency from the 1940s onward — Score 1 occurs only in 1941, 1956 and 1965; and Score 2 occurs only in 1954, 1955, 1964 and 1984.

The next graph shows the frequency of the various starting dates for the grape harvest across the three centuries. It is worth pointing out that at Schloss Johannisberg harvest occurs several weeks after the rest of the Rheingau — this has been a deliberate strategy for a very long time, to get the grapes extra ripe.

Frequency dstribution of harvest starting dates from Schloss Joannisberg

There are actually some quite regular peaks and troughs in this graph. However, the most obvious point is the lack of harvests starting on November 1 at any time during the 301 years, which is compensated by an over-abundance of starts on November 2. Of course, All Saints' Day (or All Hallows' Day, Allerheiligen) falls on 1 November. This an optional holiday that is officially observed only in parts of Germany. Indeed, this day is a public holiday in the states of Baden-Württemberg, Bayern, Rheinland-Pfalz, Nordrhein-Westfalen and Saarland. However, the Rheingau is in the state of Hessen, instead, where November 1 is not an official holiday. So, everywhere else in the vineyard area along the Rhine and Mosel rivers All Saints' Day is an official holiday, but not here! That doesn't seem to have ever stopped the locals from taking a day off, though, does it?

The next graph shows the time course of the vintage start dates, with the dates simply numbered from 1 as the earliest observed date. Once again, the pink line is a running average. [Note that for some years an exact start date was not specified.]

Time seies of harvest starting dates from Schloss Joannisberg

These dates are spread across more than seven weeks, from earliest to latest. The latest dates occurred at the end of the 1800s, which was the end of the global cold period known as the Little Ice Age (1300-1850 CE). More importantly for modern global warming, the harvests have generally started earlier from the 1960s onwards. The last November harvest start was in 1955, and since 1965 there have been only two years when the harvest was started in the last week of October. The first September harvest start for three centuries occurred in 1976.

The final graph shows the time course of the vintage harvest quantity.

Time seies of harvest quantity from Schloss Joannisberg

Obviously there is a sudden and inexorable increase in grape yield at the beginning of the 1930s. This does not appear to coincide with the purchase of extra land or any other increase in vineyard area. Indeed, I can find no mention of this change at all in the book from which the data come. However, Karl Storchmann (American Association of Wine Economists Working Paper No. 214. 2017) shows that the same trend applies to all of Germany; and he suggests "changes in production technologies or climatic conditions as potential drivers."

Finally, we could compare the harvest quality scores from this single vineyard with the quality scores for the Rheingau as a whole, as listed in the previous blog post (The grand-daddy of all vintage charts). Oddly, correlation analysis indicates that the relationship between these scores is extremely poor — only 12% of the variation in scores is related between the two datasets. That is, good years in the Rehingau as a whole are not necessarily good years for Schloss Johannisberg, and vice versa.

Monday, 20 February 2017

Trends in Twitter wine-related tweets

I recently published a blog post on Long-term trends in Google wine-related searches, which used Google Trends results to analyze wine-related web activity. Another way to quantify this sort of activity is to look at Twitter tweets.

I have no direct means to do this myself, but The great American word mapper has mapped where (in the USA) the top 100,000 words are used the most in Twitter data.


The data for the maps were drawn from billions of tweets collected by geographer Diansheng Guo in 2014. Jack Grieve, a forensic linguist at Aston University in the United Kingdom, along with Andrea Nini of the University of Manchester, identified the top 100,000 words used in these tweets, and how often they are used in every county in the continental United States, based on location data from Twitter.

For example, this first map shows you where the word "Alcohol" has appeared most often in the tweets. In each map, the darkness of the shading is proportional to the frequency of the word use in that location.


There is obviously great variation throughout the USA, which probably tells us something about the sociology of Americans. You might, for example, ask yourself why this particular word is not used in the south-eastern states. The answer may lie in the map for the word "Liquor", shown next.


What I have done for the rest of this post is take a few wine-related words (defined loosely), and used the web page to produced the maps for you. I encourage you to choose any words you like and produce your own maps, using the link above. You could try different wine-growing regions or wine styles, for example. However, you can only search for single words.

Note that for most of the wine words shown below, the words are gathered in cities, which is not true for the other alcoholic beverages. Furthermore, "Wine" and "Beer" are almost exclusive words; as, indeed, are "Syrah" and "Shiraz"!














Last updated: 21 Feb 2017.

Monday, 13 February 2017

Poor correlation among critics' quality scores

One reason for reading the wine literature is supposed to be that we get advice from experts about the relative qualities of different wines. From this advice we might be able to make an informed decision about which wines we might fork out our hard-earned cash to purchase.

However, we are often recommended to find an expert whose wine tastes match our own, before we start reading this advice. The reason for picking a single expert becomes obvious when we compare the quality scores from different critics — they frequently seem to have little in common with each other.

To examine this, we could make a direct comparison of the quality scores from well-known sources of advice, such as the Wine Spectator, the Wine Advocate, the Wine Enthusiast, Wine & Spirits Magazine, and Jancis Robinson, along with some who may be less familiar to you.


To illustrate the point, we need an example wine. Most critics rate only a few vintages of any given wine, so that most comparisons would be uninformative. This means that any comparison will be restricted to some wine that is popular among the commentators.

The one I have chosen is Penfolds Grange Bin 95, known as "Grange Hermitage" when I was young. This is possibly Australia's best known red wine among connoisseurs, famous for its longevity. The 1952 vintage is usually regarded as the first commercial release, and so we have a nice long series of vintages for which to compare the quality scores of the various professional commentators. The current release is from the 2012 vintage, making a total of 61 years.

There are very few wines that have such a long set of vintages for which quite a number of commentators have provided quality scores (most of the rest come from Bordeaux). In this case, it is because Penfolds occasionally organizes thorough retrospectives of this wine, to which the critics are invited. There are some notes on Grange at the end of this post, for those of you who are not familiar with it.

Data comparison

If we take the vintages from 1952 to 2011 inclusive, then there are four commentators whose quality scores we can directly compare across these 60 vintages: Jeremy Oliver, Huon Hooke, and the Wine Front, all from Australia, and the Wine Spectator magazine and the Wine Advocate newsletter, both from the USA. Almost all of the critics discussed in this blog post use a 100-point quality scale.

This first graph illustrates these five sets of scores.


If this looks like a mess to you, then it is because this is a mess. There is clearly very little consensus among these scores, regarding which vintages are the better ones and which are not.

We can quantify the relationships among the scores using correlation analysis. This reveals that the following percentages are held in common between these four critics pairwise:

Jeremy Oliver
Wine Front
Huon Hooke
Wine Spectator
Wine Advocate
 

42%
24%
19%
9%
Oliver


40%
23%
19%
Front



38%
26%
Hooke




35%
Spectator





Advocate

These values are very low. Indeed, no pair of critics agree on even 50% of the variation in their scores. That is, the critics disagree with each other more than they agree. This is hopeless!

If we restrict the dataset to the period 1990 to 2011 inclusive, then we can add James Halliday, Australia's best known wine commentator, as another source of quality scores. The second graph illustrates the six sets of scores for these 22 vintages.


The correlation analysis then reveals the following percentages held in common between these six critics pairwise:

Jeremy Oliver
Wine Front
Huon Hooke
Wine Spectator
Wine Advocate
James Halliday
 

29%
20%
20%
12%
18%
Oliver


44%
36%
38%
34%
Front



26%
33%
24%
Hooke




23%
37%
Spectator





42%
Advocate






Halliday

Halliday's scores vary hardly at all, so nothing much changes. The largest amount of agreement is still only 44% — when we add a critic they still don't agree with any of the previous ones!

Next, if we restrict the data to the 1995-2010 vintages, then we can add Wine & Spirits Magazine, the Wine Enthusiast and Stephen Tanzer, all from the USA. I haven't graphed the scores for these 16 vintages; but the correlation analysis reveals the following percentages held in common between these nine critics pairwise:

Oliver
Front
Hooke
Spectator
Advocate
Halliday
WineSpirits
Enthusiast
Tanzer
 

29%
26%
19%
3%
2%
1%
32%
24%
Oliver


76%
36%
40%
44%
3%
52%
41%
Front



57%
47%
46%
4%
66%
30%
Hooke




33%
40%
9%
61%
47%
Spectator





51%
5%
42%
48%
Advocate






8%
44%
23%
Halliday







8%
2%
WineSpirits








46%
Enthusiast









Tanzer

As you can see, for this restricted data set of 16 vintages we do finally get more than 50% concordance. Indeed, Huon Hooke, the Wine Front, the Wine Spectator and the Wine Enthusiast are in reasonable agreement with each other for these vintages, with Huon Hooke and the Wine Front actually having 76% agreement over these few vintages. However, the average agreement is still only 32% among the nine critics; and Jeremy Oliver has only 1% concordance with Wine & Spirits Magazine!

The discrepancies among the critics become particularly obvious when we consider the details, such as the controversial vintage of 2000. The scores for this Grange wine are:
Huon Hooke
Jeremy Oliver
Wine Front
Wine Spectator
Stephen Tanzer
Wine Enthusiast
Wine Advocate
Wine & Spirits Magazine
Falstaff Magazin
James Halliday
86
87
88
89
89
90
93
93
94
96
James Halliday and the Wine Advocate actually rated the vintage as better than the 1999, while Wine & Spirits Magazine and Falstaff Magazin (from Austria) rated them equal; the others all rated the 2000 as significantly worse than the 1999.

Finally, you may have been wondering what happened to the quality scores from Jancis Robinson, of the UK. There are two issues to be addressed here: she uses a 20-point scale instead of 100; and her scores are scattered across vintages rather than being concentrated in a single set of consecutive vintages. Nevertheless, there are scores for 37 vintages, and we can compare them to the first five critics discussed above.

I haven't graphed the scores for these vintages; but the correlation analysis reveals the following percentages held in common with the other critics:
Jeremy Oliver
Wine Front
Huon Hooke
Wine Spectator
Wine Advocate
2%
0%
0%
7%
1%
Robinson uses only six scores for Grange vintages (16.5-19), which affects the estimates of the correlations. However, you can see that her scores are literally in a world of their own — there is less than 10% agreement with any of the other five commentators; and only the Wine Spectator has a correlation that is any better random with respect to Robinson's scores.

Conclusion

The idea that wine commentators have some sort of consensus opinion with regard to wine quality is completely untenable in this example. In general, the agreement varies from 0-50%, so that the critics disagree more than they agree. Certainly, in this case, you have to carefully pick your advisor first, before deciding on which are the high-quality vintages. The wine itself is the least important component of wine quality for Penfolds Grange.

Grange notes

The ascent of Grange to its status as Australia's premier wine was slow and steady. The 1951 vintage was an experimental wine, with the 1952 vintage usually regarded as the first commercial release. This was released in 1956 — these days the wine is generally not released until it is 6 years old. The wine is principally shiraz (syrah), always blended from various sources.

The 1955 vintage was entered in the Royal Agriculture Society show in Sydney in 1962, where it won the wine's first gold medal. Internationally, the 1971 vintage then topped the Gault-Millau Wine Olympiad in Paris in 1979, beating some of the best Rhône wines.

The 1976 was the first Australian wine to pass $20 per bottle (released 1981). Hefty price increases occurred for the 1982 to 1989 vintages; and in 1987 the 1982 was released for more than $50. The 1990 vintage was released with a further big price increase; and this is when the “Hermitage” name was dropped.

In 1995, Wine Spectator magazine named the 1990 Penfolds Grange as its wine of the year, for the first time choosing a wine produced outside California or France. In the same year, Robert Parker (Wine Advocate #100, August 1995) proclaimed Grange as “the leading candidate for the richest, most concentrated dry red table wine on planet earth.” International market perceptions immediately changed, and export markets began to take allocations.

The wine is now regularly traded at auctions around the world, and its prices are followed in the same way as Bordeaux’s first growth chateaux and Burgundy’s grand crus. The release price of Grange has a huge effect on the value of other ultra-fine Australian wines. Like all such wines, mature bottles of older vintages can always be found for less money than the (not yet drinkable) current release.

Monday, 6 February 2017

Misinterpreting statistical averages — we all do it

Looking at a mass of data is often confusing at best and daunting at worst. So, our natural tendency is to summarize the data in some way, thus reducing the mass down to something more manageable. This is usually considered to be A Good Thing, but it does run the risk of being misleading.

A summary must lose information, by definition (that's what "reduction" means). What if the lost information causes us to misinterpret the summary? A summary cannot be perfect, and so our interpretation cannot be perfect, either. We need to summarize the important part of the data, and not all summaries are created equal.

Perhaps the biggest potential problem occurs when we calculate an average, as our data summary. People then seem to focus very much on the average itself, and not on the variation of the original data around that average. This can lead to misinterpretations of the original data.

Here, I will use a few examples from the world of wine data to illustrate this point. In fact, I will show that apparent patterns in data can arise from changes in either the average or the variation, or both. We need to be aware of this in practice.

Introduction

Consider this first graph, which shows the differences in the average quality ratings of the wines from the different types of Bordeaux chateaux (first to fifth growths). We immediately see a pattern of decreasing average score as we proceed from the first to the third growths, which then do not differ much from the fourth and fifth growths.

Wine quality of Bordeaux chateaux

Each point in the graph represents an average of the quality scores from a number of chateaux, but we cannot see the variation of these scores (we have only the averages). So, is the apparent decreasing pattern among the points caused by differences in average alone or differences in variation?

There are three ways that averages can show a decreasing pattern:
  • all of the data points decrease
  • the larger values become fewer
  • the data become less variable
Obviously, the inverse of each of these must be true for increasing patterns. In the next three sections I will show an example of all three possibilities.

Increase in average because all of the data points increase

This next graph shows the pattern through time of the monetary worth of alcohol exports from Australia, from 1988 to 2017. The original data are the small black points, connected by a black line.

Australian alcohol exports through time

It is clear that there is variation within and between years, and indeed this variation increases through time. It is common to summarize this type of time pattern with a running average, as shown by the thick red line — and this helps "smooth out" the pattern by averaging adjacent groups of data points. This summary is simple to interpret in this case, because all of the data points follow roughly the same pattern — they increase through time.

In this case, the summary is not misleading. I could delete the black points and line, and the red line would still be a good representation of the original data. So, presenting only the data summary would a good way to simplify the data pattern, in this case.

Increase in average because the smaller values become fewer

Now lets look at a potentially misleading case.

This next graph shows the pattern through time of the vintage quality scores of Bordeaux red wine, from 1934 to 2010. Quality is measured on a 20-point scale. The original data are the blue squares, while the red line a a running average.

Quality of Bordeaux vintages through time

The data summary (running average) shows a general upward trend, which we might interpret as a general increase through time of the quality of Bordeaux red wines, particularly since the early 1970s.

But we would be wrong — that's not what the data show. The pattern in the data is that the lowest data points "disappear" as we move from left to right across the graph. This is emphasized by the added box in this next version of the graph — there are no data points within this box, but data points do occur immediately to the left of the box.

Quality of Bordeaux vintages through time

So, the original data show that there were no vintages with a score of less than 10 out of 20 from the 1970s onward, but there were quite a few such vintages before that. The higher average wine quality thus arises because the poor vintages no longer occur, not because the quality of the other vintages increases. Top quality vintages occurred before and after 1970, but poor quality vintages died out.

In this case, the summary is misleading. That is, the red line on its own would not be a good summary of the data. I cannot usefully delete the blue points and black line.

Increase in average because the data become more variable

Averages can mislead in another way, as well.

The next graph also concerns Bordeaux red wines. Each point represents a vintage (from 1940 to 1995), with the vintage quality score shown horizontally (scale 1-7) and the vintage volume (in hectoliters) shown vertically.

Quality and quantity of Bordeaux vintages

A data summary would show a general upward trend across the graph from left to right, which we might interpret as a general increase in production as wine quality increases. This particular interpretation has certainly appeared in the literature on wine production.

But we would be wrong — that's not what the data show. The pattern in the data is that for vintages with low quality there is little wine production, but for high-quality vintages the production volume can vary dramatically. This is emphasized by the added line in this next version of the graph — there are few data points above the line, which would represent poor-quality vintages with a big production.

Quality and quantity of Bordeaux vintages

So, we rarely get big production from poor-quality vintages. This means that the apparent pattern (that there is higher average wine production as vintage quality increases) occurs because high-quality vintages can be associated with big production but poor-quality vintages cannot. Vintage production does not increase with vintage quality. Instead, variation in production increases with vintage quality — production may be big or small when the quality is high, but it is usually small when quality is low.

In this case, the summary would be misleading. It is thus a good thing that no summary line is shown on the graph.

Conclusion

Any time we are looking at a data summary, we need to bear in mind that apparent patterns in that summary can be caused by any one of three underlying patterns in the data, as illustrated above. These different causes lead to different interpretations of the summary. Be wary of data summaries when you see them, unless you can also see the original data, as well.

Data sources

First graph:
Gary M. Thompson, Stephen A. Mutkoski, Youngran Bae, Liliana Ielacqua, Se B. Oh (2008) An analysis of Bordeaux wine ratings, 1970-2005: implications for the existing classification of the Médoc and Graves. Cornell Hospitality Report 8(11): 6-17.

Second graph:
Wikimedia commons

Third pair of graphs is modified from:
Pablo Almaraz (2015) Bordeaux wine quality and climate fluctuations during the last century: changing temperatures and changing industry. Climate Research 64: 187-199.

Fourth pair of graphs is modified from:
Gregory V. Jones, Robert E. Davis (2000) Climate influences on grapevine phenology, grape composition, and wine production and quality for Bordeaux, France. American Journal of Enology and Viticulture 51: 249-261.