Monday, 25 September 2017

Did California wine-tasters agree with the results of the Judgment of Paris?

The short answer is: "No".

In May 1976, Steven Spurrier and Patricia Gallagher organized a wine tasting that has become known as the Judgment of Paris. Here, wines from France were tasted alongside some wines from California, and the latter acquitted themselves very well in the opinions of the tasters.

These sorts of comparative tastings had been conducted before, but mostly within the USA, whereas the Judgment took place in France itself with French judges; and, more importantly, it occurred in conjunction with the US Bicentennial celebrations. It therefore attracted much more media attention than any of the previous tastings. Indeed, it may well be the third most important event in the social and economic history of wine in the USA, after the imposition and then repeal of Prohibition.

However, the results of the Judgment were very variable among the tasters. Hardly any of them agreed closely with each other about the quality scoring of the wines, and especially about which wines were the best among the 10 reds (bordeaux grapes) and the 10 whites (chardonnays). This raises the question as to what other people thought about the relative quality of those same wines, at that same time.


This question is answerable to some extent by looking at the tastings of the Vintners Club, based in San Francisco. This club was formed in 1971 to organize weekly wine tastings (usually 12 wines). This club is still extant, although tastings are now monthly, instead of weekly. For our purposes here, the early tastings are reported in the book Vintners Club: Fourteen Years of Wine Tastings 1973-1987 (edited by Mary-Ellen McNeil-Draper. 1988).

One of the Club's tastings was an attempt to to evaluate the results of the Judgment tasting nearly 2 years afterwards (January 1978), as reported in a previous blog post (Was the Judgment of Paris repeatable?). However, the Club also tasted the individual wines before May 1976, usually in comparisons including other California wines (ie. those not chosen by Spurrier for the Judgment). Indeed, the success of the wines at these tastings seems to have played some part in establishing their respective reputations, leading to Spurrier choosing them to take part in the Judgment.

So, we can compare the Judgment wines quite independently of the Judgment itself, but in the same time period. This is an interesting exercise; and it emphasizes the point made at the time by Frank J. Prial (New York Times June 16 1976, p. 39), about the variability of wine assessments: "One would be foolish to take Mr Spurrier's little tasting as definitive."

Comparisons

Here, I will focus on the six California cabernets and six California chardonnays, each of which was tasted at the Vintners Club at least once before and once after the Judgment. All four of the French Bordeaux reds were also tasted at least twice at the Club, but not always both before and after the Judgment; and only one of the four French Burgundy whites was ever tasted at the Club.

Immediately preceding the Judgment, four of the six California chardonnays were tasted at the Chardonnay Taste-Off (February 1976), and four of the six California cabernets were tasted at the Cabernet Sauvignon Taste-Off (March 1976). These comparative Taste-Offs at the Vintners Club are explained in an earlier post (Wine tastings: should we assess wines by quality points or rank order of preference?).

The results of the various tastings are shown in the two graphs. The scores for each wine are the average from those tasters present on each occasion, based on the standard UC Davis scoring system, as used by the Vintners Club. The dates of the tastings are shown relative to the Judgment of Paris (May 24 1976).

In order to facilitate comparisons, the wines are listed in the graph legends in the order of their results at the Judgment itself (ie. highest score to lowest).

Vintners Club tastings of the Judgment cabernets

Among the cabernets, the Stag's Leap, Ridge, and Heitz wines were pretty much equal at every tasting. These wines were consistently rated as superior to the other three red wines, but we should not see any one of these three as being better than the other two. Interestingly, the only occasions on which the Stag's Leap wine was judged to be "best" was at the Judgment itself and again at its re-enactment. Also, note that the results for the Mayacamas wine were rather erratic, especially given its third-place result at the Judgment.

Vintners Club tastings of the Judgment chardonnays

Among the chardonnays, the Chateau Montelena wine did not do well. Indeed, the Chalone wine consistently scored better than the Montelena, except at the Judgment of Paris itself. Indeed, most of the white wines scored better than than the Montelena except at the Judgment and its re-enactment. Also, for the whites it was the David Bruce wine that received particularly erratic results, on at least one occasion performing very well.

Finally, it is worth noting that 9 of the 12 wines received much higher scores at the 1978 re-enactment of the Judgment than they had before the Judgment tasting. It is hard not to see a subjective post-hoc bias in this result.

Conclusions

Clearly, the unique accolades heaped on the Judgment's two "winning" wines were not justified by the Vintners Club comparisons of the 12 wines. The Montelena white, in particular, was usually bested by the Chalone wine; and the Stag's Leap red was never better than those from Ridge or Heitz. This emphasizes the unreliability of single tastings for assessing wines — the outcomes depend too strongly on the circumstances, particularly the tasters present. Furthermore, some wines obviously received very variable assessments, sometimes being rated much more highly than on other occasions — either these wines were showing bottle variation or they were in an unusual style (as has been noted for the Bruce wine).

Monday, 18 September 2017

Getting the question right

I have written quite a few posts in which I analyzed a dataset using some particular mathematical model. Obviously, the model chosen is of some importance here — different models might give different outcomes (although, hopefully not). However, the choice of model is actually determined by the original question being asked of the data — we need to match the question and the appropriate model.

This raises the important issue of getting the question right. This is especially true if we are trying to relate causes and effects. For example: is the causal factor the presence of some something, or the absence of something else? Sherlock Holmes is famous for drawing Inspector Lestrade's attention to "the curious incident of the dog in the night-time." It turned out that the important thing was that the dog did nothing, under circumstances when a guard dog should clearly have done something. Holmes solved the crime by asking a question about an absence, not a presence.

As an example from the wine world, consider the following graph. It shows the recent time-course of the percentage each of five countries has had of the global wine export market. The data are taken from Kym Anderson & Nanda R. Aryal (2015) Growth and Cycles in Australia’s Wine Industry: a Statistical Compendium, 1843 to 2013, with additions listed by the AAWE.

Global export percentages for the top five countries

We could ask any number of questions about these data. For example, we could ask about the general increase across the five countries since 1990, and whether it can be sustained. However, the most obvious question is likely to be about the time-course pattern for Australia, which seems to be dramatically different to the other four countries. But should that question be about the sudden increase that occurs from 2000 onwards, or the sudden decrease that occurs after 2005? Which pattern do we try to explain?

The second question (which seems to be the one that the Australian wine industry has been asking) would ask about why the "good times" suddenly crashed in 2005, and what the industry might do about it. On the other hand, the first question might ask about why the increase occurred in the first place, assuming that the subsequent decrease is simply a "return to normal" after a short-term aberration.

Let's look at how we might analyze Question 1. This next graph shows the Australian data compared to the average time-course of the other four wine-exporting countries (ie. excluding Australia).


The red line shows a very straightforward increase in export percentage through time. We might treat this line as a possible model of the "expected" pattern of growth, and then try to explain why the pattern for Australia does not fit in with it. This would be one way of answering Question 1. What we would do would be to apply a mathematical model to the red line, and then see how that model compares to the Australian data.

The next graph shows the fit of a simple Polynomial model to the average data, as indicated by the red dashed line. This model fits the data extremely well, as it accounts for 98% of the variation in the Average data.


We can, of course, now use this model to explore possible forecasts for future export growth. For example, the model forecasts that the Average export percentage will peak at 4.2%, which will occur in c. 2024. This might be a reasonable goal for an exporting country, to capture 4-5% of the market, and to consider themselves to have done well if they exceed this level.

More to the point, we can compare this model to the Australia data, as shown in the next graph. The blue dashed line is simply the red dashed line raised by 1.2 percentage points (which is the best fit to the Australia data). This reveals that from 2013 onwards the Australian exports were exactly where we would forecast them to be, based on the 1990-1995 data.


So, answering Question 1 would quite a reasonable way to tackle these data — the data do support the idea that the decrease in Australian export percentage may well be simply a return to "normal" after a short-term aberration. The downwards trend can be seen, not as a crisis, but merely as a correction. These are two quite different interpretations.

Getting the question right is crucial. Data analysis often suffers from what is called confirmation bias, in which we simply try to confirm the assumed answer to our initial question. That is, we look for what the dog did in the night-time, instead of looking for what it did not do — and we often find something that the dog did, no matter how irrelevant it may be!

Monday, 11 September 2017

Why lionize winemakers but not viticulturists?

It is widely noted that viticulturists can have as much influence on the quality of the final wine as do winemakers, and yet it is still the winemakers whose names are most widely known, because they are the ones who most commonly appear in the wine press. So, the people in the winery get the media attention more than those in the vineyard, even though the location of that vineyard is acknowledged to be of prime importance.

To counteract this trend, in this post I discuss one example, from Australia, where the viticulturist often gets almost as much press as the winemaker.


Wynns Coonawarra Estate is by far the biggest winery in the Coonawarra region of Australia, a region that has an international reputation for the quality of its cabernet sauvignon wines (although the shiraz wines are not too shabby, either). Wynns consistently project three people as being their "team", as listed in the first photo below. [Note: Ben Harris, the Vineyard Manager, tends to go missing from most of the press; see the photo at the bottom of the post.]

What is more important for our purposes here, the media actively go along with Wynns' attitude. I have listed a few press reports at the end of this post, as a small sample of what the wine media have to say. The two people titled "winemaker" do get more press than the viticulturist, although much of their personal press does tend to emphasize them as females in a male-dominated profession. Indeed, Sue Hodder and Sarah Pidgeon were jointly named the 'Winemaker of the Year' at the 2016 Australian Society of Viticulture and Oenology (ASVO) Awards for Excellence.

However, back in 2010, when Sue Hodder was named Australian Gourmet Traveller WINE’s 'Winemaker of the Year', a new award was introduced for Allen Jenkins: 'Viticulturist of the Year'. Part of the reason for acknowledging the importance of the viticulturist at Wynns has been his role in rejuvenating the vineyards over the past 15 years, and the clear effect that this has had on the quality of the wines.

L to R:  Sarah Pidgeon (Winemaker)
Sue Hodder (Chief Winemaker)
Allen Jenkins (Regional Vineyard Manager)

The rejuvenation program

Sue Hodder joined Wynns just prior to the 1993 vintage; and she was then appointed Chief Winemaker in 1998, at which point Sarah Pidgeon became Winemaker. Allen Jenkins arrived as the viticulturist in 2001-2, at least partly because Hodder and Pidgeon had realized that the vineyards needed extensive treatment, if the wines were to be improved.

For example, during the 1990s it was noted that the vines were building up too much dead wood, as a result of 20 years of (minimal) mechanical pruning. Indeed, the vines were reported to be so low yielding that they were hard to pick. The rejuvenation started in 2000, and was accelerated in 2002. It was expected to take eight years to complete; and the change in the wines was reported widely in the media starting from 2010.

The process involved large-scale vine regeneration by heavy chainsaw pruning of very old vines (shiraz up to 120 years old, cabernet sauvignon up to 60 years old), removing the dense clusters of dead wood, and thus bringing the vines back to a new physiological balance. Tired or diseased vines were grubbed out, along with the removal of lesser varieties. These were all replaced by new clones and rootstocks of cabernet and shiraz, for which the winery developed a heritage nursery, based on cuttings from time-proven vines. There was re-trellising, along with changed canopy management and new pruning techniques. The vineyards were also converted from sprinkler to drip irrigation.

Along with all of this, the winery was also modified to focus more on small-batch vinification, from 2008 onwards. This allows the grapes to be picked at perfect physiological ripeness, as even a large vineyard block can now be processed in many small batches instead of a few large ones. This takes advantage of the increased grape quality in the vineyard. The oak maturation of the wines has also been re-visited, resulting in a lighter handling, which now produces softer, more elegant wines. Indeed, the latter approach is a return to the style from the 1960s, rather than the heavier style favored in the 1980s and 1990s.

The flagship Wynns wines are the John Riddoch Cabernet Sauvignon and the Michael Shiraz, which are made only in years when grapes of very high quality are available. Production was stopped on both of these wines during the 2000-2002 part of the rejuvenation period. So, to see the effects of the rejuvenation on the quality of the Wynns wines, we need to look at a different product from the winery.


Black Label Cabernet Sauvignon

Within the Wynns range, the Black Label Cabernet Sauvignon holds a special place, even though it is marketed as the "basic" wine from the winery, with an average annual production of roughly 40,000 cases. The wine is currently blended from about 20 different small parcels of grapes, out of up to 80 that are contenders each year. The vines were planted mainly in the 1960s, 70s and 80s.

The flagship John Riddoch cabernet is always denser, more powerful and oaky than the cheaper Black Label, but the latter is always better value for money, selling for less than one third of the top wine's price (and often being aggressively discounted by retailers). Indeed, it has been repeatedly shown that the Black Label can age for decades, making it "possibly the most important cellaring wine in Australia", and forming "the backbone of many Australian cellars for over 50 years". This makes it "one of the most important wines in Australia’s wine history". Myself, I think that it is the best value-for-money cabernet wine that Australia produces.

There have been a number of retrospective tastings of this wine organized by Wynns, which go all the way back to the first vintage, in 1954. For example, there was an important vertical tasting covering the 50 years from 1954- 2004, which Hodder has described as the catalyst for the winemakers changing the style away from the heavier style of the 1990s. There was also a 60-year vertical tasting earlier this year.

Average Wine-Searcher scores for Wynns Black Label Cabernet

However, the published reports from these tastings are somewhat sporadic. So, for an evaluation of the effects of the vineyard rejuvenation it will be simpler to cover a shorter period. The graph above shows the weighted average scores from the Wine-Searcher database, covering the vintages from 1990 (ten years before the rejuvenation started) to 2014, inclusive.

Note that for almost every year since 2004 the wine has been scored 91 or higher, whereas before that 91 was the rare top score. There is no doubt that the wine is in the best form it’s been in for years. And the viticultural team can take most of the credit.

Buy yourself a bottle. Put it away for ten years. Then drink it. You will see what I mean about value for money.


Bibliography

Who says New World wines don't develop? — Michael Apstein

Gourmet Traveller | Viticulturist of the year — Susanne Bell

Who dares Wynns — James Halliday

Wynns Coonawarra: a revolution many years in the making — James Halliday

Wynns wine legend turns 60 — Huon Hooke

A 17-year winemaking partnership — Cathy Howard

Interview with Sue Hodder —Jeannie Cho Lee

Wynns unleashes Coonawarra’s diversity — Chris Shanahan

Wynns Coonawarra — great winemaking but the marketing sucks — Chris Shanahan

How Sue Hodder’s history lesson improved Wynns’ Coonawarra reds — Chris Shanahan

Profile: Sue Hodder — Tyson Stelzer

Monday, 4 September 2017

Increase in US wine consumption over 10 years

Recently, the American Association of Wine Economists published on their Facebook page some data (here and here) showing the annual wine consumption per capita in the USA, both in 2005 and in 2014. Recalculating these data shows that there was, on average, a 13% increase in wine consumption per person between these two years, for the country as a whole.

However, this is only part of the picture, as we all know that wine consumption will not have increased equally among all of the states. So, I have plotted the state-by-state data in the graph below. Here, each of the points represents a particular state, with its location on the axes representing consumption (liters per person) in 2005 (horizontally) and 2014 (vertically). If the consumption per capita was the same in both years then the points would lie along the pink line; and if they are above the line then the consumption was greater in 2014 than in 2005.

US wine consumption in 2005 and 2014

The graph shows that the biggest boozers are in the District of Columbia. Indeed, the per person consumption in DC is more than double that of fully 37 of the states. This may explain some of the decisions that come out of Washington.

DC is followed a long way back by New Hampshire, followed even further back by Vermont and Massachusetts. For comparison, a couple sharing a bottle of wine per week would consume 20 liters per adult per year, which is equivalent to 15 liters per capita (given that 25% of the population is below drinking age). The graph shows that only DC, NH, VT and MA exceed this annual level (ie. the top four points in the graph).

For 20 of the states the annual wine consumption increased between 2005 and 2014 by more than 1 liter per person — the dotted lines on the graph indicate plus/minus 1 liter per person. The biggest increases were in Vermont, followed by Massachusetts, New Jersey and New Hampshire. Of these, only Vermont was a long way above the average increase.

None of the states had a decrease in annual consumption of more than 1 liter per person. However, three of the states were a long way below the average increase (ie. much less than +13%): Delaware, Colorado and Nevada.

These results seem to be quite good for the wine industry. As Charles Olken says: "Phew. Thank goodness" (Americans are turning away from wine ~~ No, they are not. Yes, they are). Nevertheless, almost all Americans drink much less than a bottle of wine per week; so there is much room for improvement.

Monday, 28 August 2017

Why do people get hung up about sample size?

We cannot collect all of the data that we might want, in order to find out whatever it is that we want to know. A botanist cannot collect data from every rose plant on the planet, an ornithologist cannot collect data on every humming bird on the planet, and a wine researcher cannot collect data on every wine on the planet. So, instead, we collect data on a sub-sample, and we then generalize from that sample to the population that we are actually interested in.

Many people seem to think that the size of the sample is important, and they are right. However, size is not the most important thing, not by a long chalk. The most important thing is that the sample must not be biased. Even a small unbiased sample is much much better than a large biased sample.

Bias refers to whether the sample accurately represents the population we are taking the sample from. If the sample does represent the population then it is unbiased, and if it does not represent the population then it is biased. Bias is bad. In fact, it is often fatal to the work, because we will end up making claims about the population that are probably untrue.


Let's take the first example that I worked out for myself, when I was first learning about science. In 1976, Shere Hite published The Hite Report on Female Sexuality in the U.S.A. She had distributed questionnaires in many different ways, including direct mailouts and enclosures in magazines. She described the final sample of females as follows: "All in all, one hundred thousand questionnaires were distributed, and slightly over three thousand returned (more or less the standard rate of return for this type of questionnaire distribution).” She also emphasized that her sample size was much larger than had ever before been used for studies of human sexual behavior (eg. by Kinsey, or Masters and Johnson).

Here, the intended population from which the sample was taken is not the same as the actual sampled population — the questionnaires may well have been distributed to a group of females who were representative of women in the U.S.A., but there is no reason to expect that the respondents were. The respondents chose to respond, while other women chose not to.

It should be obvious that there are only two reasonable conclusions about females in the U.S.A. that can be drawn from this study: (1) it seems that c. 3% of the females will discuss their sex lives, and (2) it is likely that 97% of the females do not voluntarily discuss their sex lives. There is no necessary reason to expect that the sexual activities of these two groups will be the same, at least in the 1970s. Indeed, our general knowledge of people probably leads us to expect just the opposite. Hite’s report is thus solely about the smaller of these two groups (ie. those who will reveal their sex lives), and no justifiable conclusions can be reached about the larger group.

Note that the problem here is not the sample size of 3,000 — it is solely the non-representativeness of this sample that is at issue, since a sample of this size could easily be representative even of a population as large as that of the U.S.A. At one extreme, if I want to work out the ratio of males:females on this planet, then I will actually get the right answer even with a sample of two people, provided one is male and the other is female!

It is important to note that all samples are an unbiased representation of some population, whether large or small. The trick is that we need to work out what that population is. If it is not the same as the population that we intended, then we are in trouble, if we try to generalize our conclusions beyond the actual population. This was Shere Hite's problem, because she drew general conclusions about women in the U.S.A. (her intended population) rather than just those women who will discuss their sex lives (her sampled population).

It is for this reason that government censuses try to sample all (or almost all) of the relevant people. This is the best way to avoid biases — if you can get data from nearly everyone, then there cannot be much bias in your sample!


Professional survey organizations (e.g. Nielsen, Gallup, etc) usually try to address this issue by defining specific sub-samples of their intended population, and then pooling those sub-samples to get their final sample (this is called stratified sampling). For example, they will explicitly sample people from different ages, and different professions, and different ethnic backgrounds, etc — defining sub-groups using any criteria that they feel might be relevant to the question at hand. This greatly increases their chances of getting an unbiased sample of the general populace.

But even this approach does not guarantee that they will succeed. The example that I used to give my students involved predictions for nine consecutive Australian federal elections (1972-1990) from seven different survey organizations. These polling groups mostly forecast the winning political party correctly, although the winning percentages were sometimes quite inaccurately estimated. However, there was one year (1980) when they all got it wrong; that is, they all predicted that the Labor party would win, by margins of 2-9%, whereas the Liberal/NCP coalition actually won by 1% of the vote. In this case their stratified sampling failed to account for the geographical distribution of voters in the various electoral regions.

Note, also, that these types of survey organizations do not focus as much on sample size as they do on bias, as I emphasized above. For example, in 2014, the Nielsen survey organization announced an addition of 6,200 metered homes to its sample used for assessing television markets in the USA, in terms of which channels/shows are being watched (see Nielsen announces significant expansion to sample sizes in local television markets) — this represented "an almost 50% increase in sample size across the set meter market." That is, even after the increase, c. 20,000 homes are currently being used to sample an estimated population of nearly 120,000,000 US homes with TVs (see Nielsen estimates 118.4 million TV homes in the U.S. for the 2016-17 TV season).


The points that I have made here also apply to the modern phenomenon of collecting and analyzing what is called "Big Data". This has become a buzz expression in the modern world, appearing, for example, in biology with the study of genomics and the business world with the study of social media. Apparently, the idea is that the sheer size of the samples will cure all data analysis ills.

However, data are data, and an enormous biased dataset is of no more use than is a small biased one. In fact, mathematically, all Big Data may do is make you much more confident of the wrong answer. To put it technically, large sample sizes will address errors due to stochastic variation (ie. random variability), but they cannot address errors due to bias.

So, Bid Data can lead to big mistakes, unless we think about possible biases before we reach our conclusions.

Monday, 21 August 2017

Wine tastings: should we assess wines by quality points or rank order of preference?

At formal wine tastings, the participants often finish by putting the wines in some sort of consensus quality order, from the wine most-preferred by the tasting group to the least-preferred. This is especially true of wine competitions, of course, but trade and home tastings are often organized this way, as well.

One interesting question, then, is how should this consensus ordering be achieved; and do different methods consistently produce different results?


At the bottom of this post I have listed a small selection of the professional literature on the subject of ranking wines. In the post itself, I will look at some data on the subject, ranking the wines in two different ways.

Dataset

The data I will look at come from the Vintners Club. This club was formed in San Francisco in 1971, to organize weekly blind tastings (usually 12 wines). Remarkably, the club is still extant, although the tastings are now monthly, instead of weekly. The early tastings are reported in the book Vintners Club: Fourteen Years of Wine Tastings 1973-1987 (edited by Mary-Ellen McNeil-Draper. 1988).

The Vintners Club data consist of three pertinent pieces of information for each wine at each tasting:
  • the total score, determined by summing each taster's ranking (1-12) of the wines in descending order of preference (1 is most preferred, 12 is least preferred)
  • the average of the UCDavis points (out of 20) assigned by each taster — the Vintners Club has "always kept to the Davis point system" for its tastings and, therefore, averaging the scores is mathematically valid
  • the number of tasters voting for the wine as 1st place (and also 2nd and 12th).
The Vintners Club uses the total score as their preferred ranking of the wines for each tasting. That is, in the book the wines are ranked in ascending order of their total score, with the minimum score representing the "winning" wine.

For my dataset, I chose the results of the 45 "Taste-offs"  of California wine. These tastings were the play-offs / grand finals (depending on your sporting idiom), consisting of the first- and second-place wines from a series of previous tastings of the same grape varieties. The Vintners Club apparently began its annual Taste-off program in 1973, and has pursued the concept ever since.

In my dataset, there are 14 Taste-offs for cabernet sauvignon, 12 for chardonnay, 9 for zinfandel, 4 for pinot noir, 3 for riesling, and one each for sauvignon blanc, gamay, and petite sirah. There were 17-103 people attending each the 45 Taste-offs (median 56 people per tasting), of whom 43-96% submitted scores and ranks (median 70%).

For each tasting, I calculated the Spearman correlation between the rank-order of the wines as provided by the total scores and the rank-order of the wines as provided by the average Davis points for each wine. This correlation provides a measure (scale: 0-100%) of how much of the variation in ranks is shared by the two sets of data (total scores versus average points). The percentage is thus a measure of agreement between the two rankings for each tasting.

Total scores and average points

The graph shows the results of the 45 tastings, with each point representing one of the Taste-offs. The horizontal axis represents the number of people providing scores for that tasting, while the vertical axis is the Spearman correlation for that tasting.

Correlation between two methods for ranking wines

As you can see, in most cases the correlation varies from 50-100%. However, only 1 in every 5 times is the correlation above 90%, which is the level that would indicate almost the same ranking for the two schemes. So, we may conclude that, in general, the total score and the average points do not usually provide the same rank-order of the wines at each tasting.

Indeed, in two cases the two schemes provide very different rank-orders for the wines, with correlations of only 41% and 23%. This is actually rather surprising. These two tastings both involved chardonnay wines, for some reason.

It is a moot point whether to sum the ranks or average the scores. That is, we cannot easily claim that one approach is better than the other — they produce different results, not better or worse results. However, for both approaches there are technical issues that need to be addressed.

For averaging, we need to ensue that everyone is using the same scale, otherwise the average is mathematically meaningless (see How many wine-quality scales are there? and How many 100-point wine-quality scales are there?). Similarly, when trying to combine ranks together, there is no generally agreed method for doing so — in fact, different ways of doing it can produce quite inconsistent outcomes (see the literature references below).

Number of first places

For those wines ranked first overall at each tasting, only 4-60% of the scorers had actually chosen them as their personal top-ranked wines of the evening, with an average of 22%. That is, on average, less than one-quarter of the scorers ranked the overall "winning" wine as being at the top of their own personal list. This indicates that rarely was there a clear winner.

Indeed, for only half of the wines was the "winning" wine the one that got the largest number of first places, based on either the sum of ranks or the average points. Indeed, for those wines ranked first overall at each tasting, for only 24 of the 45 tastings was that wine the one that received the greatest number of 1st place votes during the evening. Similarly, for the wines with the highest average score at each tasting, for only 25 of the 45 tastings was that wine the one that received the greatest number of 1st place votes during the evening.

We may safely conclude that neither being ranked 1st by a lot of people, nor getting a high average score from those people, will actually make a wine the top-ranked wine of the evening. As I have noted in a previous blog post, often the winning wine is the least-worst wine

Footnote

Confusingly, for each tasting, the Vintners Club rank data very rarely add up to the expected total for the number of people providing results. That is, the sum of the ranks should = 78 x the number of people providing scores. A few points less than the expected number likely represents a few tied votes by some of the scorers. However, there are also many tastings where the total scores add up to much more than is possible for the number of people present at the tasting. I have no explanation for this. (And yes, I have considered the effect of alcohol on the human ability to add up numbers!)



Research Literature

Michel Balinski, Rida Laraki (2013) How best to rank wines: majority judgment. In: E. Giraud-Héraud and M.-C. Pichery (editors) Wine Economics: Quantitative Studies and Empirical Applications, pp. 149-172. Palgrave Macmillan.

Jeffrey C. Bodington (2015) Testing a mixture of rank preference models on judges’ scores in Paris and Princeton. Journal of Wine Economics 10:173-189.

Victor Ginsburg, Israël Zang (2012) Shapley ranking of wines. Journal of Wine Economics 7:169-180.

Neal D. Hulkower (2009) The Judgment of Paris according to Borda. Journal of Wine Research 20:171-182.

Neal D. Hulkower (2012) A mathematician meddles with medals. American Association of Wine Economists Working Paper No. 97.

Monday, 14 August 2017

European wine taxes — and what to do about them

Almost all countries have some sort of tax on wine. Some places have a uniform tax throughout the country; and some countries have taxes that vary between their states (eg. the USA — see State and Local Alcohol Tax Revenue, 2014). The countries of the European Union (EU) may have many common economic policies, but uniform taxes is certainly not one of them. Therefore, the tax on wine varies considerably between the countries of Europe.

There are three possible taxes that might apply to wine:
  • Import Duty, if the wine comes from outside Europe
  • Value-Added Tax, which applies to all goods and services, and which goes by many names throughout the world (eg. VAT, TVA, IVA, GST, MwSt, Moms)
  • Excise Tax, which applies to specific products, such as alcohol or tobacco (and, in the dim past, also things like salt and sugar).
Here, we will concern ourselves only with the latter two taxes, on the assumption that the wine has originated in Europe or has already been imported there.

Recently, the Facebook page of the American Association of Wine Economists has considered this issue in one of their posts: Excise Taxes and VAT on Still Wine in the EU 2017. I have used these data as my basis; but I have modified them to account for the fact that in Europe the Value-Added Tax is charged on the total bottle price, including the Excise Tax (and also the Import Duty, for that matter). I have also added the data for Norway and Switzerland, which are currently not in the EU.

This graph shows the full data for 29 European countries, with the combined taxes on the wine expressed as a percentage of the final bottle price paid by the consumer.

Percent taxes on a bottle of wine in various European countries

Note that the EU countries are neatly bracketed by the two non-EU countries, with Swiss alcohol being dramatically cheaper than the Norwegian stuff. Also, note that the six "top" countries stand out from the rest — there is not a great difference between Estonia (23%) and Germany (16%), but the denizens of Denmark, Sweden, the UK, Finland, Ireland and Norway pay notably more tax for their wine than do other Europeans.

What to do?

Needless to say, the residents of these six countries do not take this situation lying down. Something must be done!

And the thing to do is to take advantage of the fact that there is free trade throughout the EU. That is, if you are a resident of a country with a sales tax that you don't like, then you can simply purchase your goods in another country, with a tax more to your liking. [*]

So, the British go to France, this being their closest country with lower alcohol taxes. These days there are large shopping complexes at the appropriate entry points into France (or exit points if you are going the other way). These are often owned and run by British companies. You simply order your goods online before you leave Britain, travel to France for the day (or a few days, if you want a proper holiday), and pick up your pre-packed goods on the way home. It is as simple as that.

I have no idea what the wine drinkers plan to do if Brexit is implemented.

Similarly, the Swedes and Danes go to Germany. Once again, there are large shopping complexes just where the main roads cross the Danish border, or near the boats too / from Sweden. These are usually run by Germans, and you don't order your goods ahead of time; but Scandinavians do commonly stop at them on their drive home at the end of any trip down south.

The Finns go to Sweden, on boats. There are a couple of large cruise liners that go back and forth between Stockholm and Helsinki every evening. Each trip runs overnight, and you spend the following day in the destination city, returning home that evening (ie. the entire trip, there and back. takes 40 hours). These boats are nominally run to transport trucked goods to Finland, which would otherwise need to be accessed via a long land detour through Russia — the boat is officially part of European road number E18. But, in reality, these are duty-free party boats, with a casino, nightclub, movie theater, duty-free shops, etc. Both Swedes and Finns are observed to depart from these boats with shopping carts loaded with duty-free alcohol.

Finally, the Norwegians go to Sweden for their alcohol. This might seem odd, since Norway is not in the EU and Sweden is, so that the goods are technically being imported into Norway (and should therefore attract Import Duty). However, there is unrestricted land movement of people between these two countries (see Nordic Passport Union), and only large trucks are stopped at the customs points. It has always been like this, even though the two countries parted company over a century ago.


So, if you drive south from, say, Oslo (the largest city in Norway), then just after you cross the Swedish border (about 90 minutes from Oslo) you will encounter two large shopping complexes, labeled Nordby and Svinesund in the satellite photo above. These are in the middle of a forest, with no nearby Swedish town — the Swedes have constructed them solely for the Norwegians. If you visit these shops, you will see lots of these people packing their station wagons with goods that attract much lower taxes in Sweden than they do in Norway. Indeed, Nordby actually has the liquor store with the biggest annual turnover in Sweden!

Such is life in modern Europe.



[*] In this sense, the European Union is more united than is the United States, where free trade in alcohol is still not widespread (see Consumers short-changed again on shipping), in spite of the fact that Prohibition was repealed eight decades ago.

Monday, 7 August 2017

The blind leading the blind?

Recently, The Economist magazine tried to champion the cause of blind tastings by using the results of the 2017 wine-tasting contest between Oxford and Cambridge universities (Think wine connoisseurship is nonsense? Blind-tasting data suggest otherwise). The conclusion was that the tasters "performed far better than random chance would indicate." However, very little data analysis was performed, and so a look at their data is in order.


The Economist notes:
The main results of the 2017 Varsity blind-tasting match, held on February 15th, are depicted above. Two teams of seven tasters each (including one reserve per side) were presented with 12 wines, six whites and six reds. The judges granted each taster between zero and 20 points per wine, depending on how close (in their estimation) the drinkers’ guesses were to the correct answers, and how convincingly they explained their reasoning. However, we prefer a simpler scoring system: one point for getting the country of origin right, another point for getting the grape variety right and a judicious half-point of partial credit only in a handful of specific cases.
The group’s overall accuracy was far superior to what could be expected from random chance. Given the thousands of potential country-variety pairs, a monkey throwing darts would have virtually no hope of getting a single one right. But 47% of the Oxbridge tasters' guesses on grape variety were correct, as were 37% on country of origin.
The Economist does point out the rather obvious variation in success, among both the tasters and the wines — some tasters did much better than others, and some wines were identified much more commonly than others. However, a variance-components analysis of the data indicates that it is the variation among the wines that dominates the dataset — for the successful identification of grape variety, 90% of the variability is due to the variation among the wines and only 5% is due to the variation among the tasters; and for the identification of country of origin, it is 65% and 25%, respectively.

So, any general comments about the success of blind wine-tasting must be tempered by the fact that some wines are apparently much easier to identify (by grape or country) than are others.

Statistical evaluation

The Economist's assessment of the probability of success is based on a mathematically naïve set of assumptions. As an example of their "dart-throwing" calculation: there are c. 100 common red-grape varieties, and so there is a 1% chance of me getting one right at a blind tasting by simply guessing. I would then have a 6% chance of getting at least one wine right if I simply guess the same red grape each time, for the six wines. This makes the 47% success rate of the tasters look pretty good.

However, this calculation is mathematically naïve because human beings are not monkeys, with or without darts. Some grape varieties occur in wines much more commonly than do others, and those grapes are more likely to be represented in the tasting contest; and human beings know this, even if the monkeys do not. Similarly, some countries are more likely to be represented in a wine tasting than are others, especially given the presence of certain grape varieties. For example, how many Gamay wines are made outside of France? If I simply assume "Beaujolais" for a Gamay wine then I have a 95% chance of being right!

We therefore cannot assume that an educated wine taster is the same as a monkey throwing darts. The wine taster is not guessing, any more than a motor mechanic is guessing when diagnosing a fault in your car. They both have prior knowledge, which even at worst produces an educated guess (and at best is professional expertise). That is, an "educated guess" should be the basis of our statistical comparison, not a "random guess", as done by The Economist.

So, in order to work out the actual probabilities of success for each grape (and country) I need to know the probability of one of the wines in the contest being, say, Chardonnay. That is, I would need to know the probability of the competition organizers choosing each of the grape varieties and countries for the tasting. Sadly, I do not have this information.

As a realistic substitute, I will use how common the different varieties/countries are in liquor stores. That is, I will assume that the bottles have chosen from the selection available in the shops.

For this, I will use the wine database of the Systembolaget liquor chain, in Sweden. I have used this database before (eg. How many wine prices are there?) because, being the third largest liquor chain in the world, it's selection of wines is extensive. Furthermore, being a European chain, it is likely to match the British organizers' probabilities of choice better than would many other sources. Indeed, for both the red and the white varieties, the organizers chose 4 of the 5 most common grapes in the Systembolaget database (out of the 6 chosen). So, my probabilities may be pretty good, at least from the point of view of the participants working out which wines they are likely to encounter in the tasting.

As an example, 25% of the white wines in Systembolaget's database have Chardonnay listed as a principal grape variety. This means that we would expect an 82% chance of at least one of the 6 white wines being Chardonnay. The participants actually had an 86% success rate at identifying the Chardonnay. So, my analysis suggests that in this one case they have not actually done any better than they could have done by taking an educated guess based simply on how common the wines are in the shops. The question they are answering in the tasting is not "is this a Chardonnay?" but "which one is the Chardonnay?"!

Statistical results

So, my basis for estimating the prior probabilities of expected success for the participants is to work out the probability of at least one of the wines being of that variety or region (based on its frequency in the Systembolaget database). We can then compare this to the tasting results for each grape variety and each country, to see if the participants actually did better than an educated guess.

For each of the graphs presented below, the interpretation is as follows. Each variety or country is represented by a horizontal line, as indicated by the legend. The central point on each of the lines represents the percentage of the tasters who succeeded at the task for that wine. The two end points on each line are the boundaries of the estimated 95% confidence interval (formally: the Score binomial 95% confidence interval). This interval gets smaller as the sample size (the number of tasters) gets larger, as it represents our statistical "confidence" in the results. The asterisk represents the expected results if the tasters are performing in accordance with the estimated prior probabilities. So, if the asterisk is within the 95% confidence interval for a particular wine, then the tasters have done no better than an educated guess for that wine, whereas if the asterisk lies outside the 95% confidence interval then the tasters have done better (or worse) than expected.

Expected versus actual correctness for grape varieties

Expected versus actual correctness for countries

The analyses indicate that in only 2 out of 12 cases did the participants identify the grape variety with any more success than would be expected based on the commonness of the wines: the Pinot Noir and the Gamay. Otherwise, they did as well as we would expect using an educated guess — except in the case of the Riesling wine, where they did rather poorly. In this case, Riesling is apparently a more common wine grape than the participants realize!

The analyses also indicate that the tasters did both better and worse than expected with the identification of country of origin. In three cases they did better than expected (France and New Zealand for the red wines, and Australia for the white wines), and in three cases they did worse than expected (Spain for the red wines, and France and Italy for the white wines). That is, French white wine is apparently a more common type than the participants realize, as also are Italian white wine and Spanish red wine.

Conclusion

I have indicated before that blind tastings are notoriously hard (see Can non-experts distinguish anything about wine?). The results and analyses presented here confirm that conclusion — for some wines the participants did very well, but in most cases they could have done just as well by guessing based on how commonly the wines are encountered. The Economist's optimism in this case is misplaced, due to a naive assessment of the prior probabilities of success.

Monday, 31 July 2017

Do winery quality scores improve through time?

The Michael David Winery is perhaps the best-known wine producer in Lodi, California. It has recently been in the news, as it was named the 2015 "Winery of the Year" by the Unified Wine & Grape Symposium; and its Seven Deadly Zins old-vine zinfandel topped Shanken’s Impact newsletter's “Hot Brand” award list in both 2015 and 2016. Not bad for a Central Valley winery producing 500,000 cases per year, with more than two dozen different wine labels.


The Wine Industry Advisor notes for the Seven Deadly Zins (a blend of grapes from seven Lodi growers) that: "As sales grew so did the scores, as it achieved 90+ ratings from both Robert Parker and Wine Enthusiast Magazine over the past three vintages." Indeed, the first graph shows the Wine Enthusiast scores for eight vintages of seven different wines from the Michael David Winery. With two exceptions (in 2008), the scores do seem to increase through the vintages.

Wine Enthusiast scores for seven Michael David wines

Is this pattern real? Obviously, it depends on what we mean by "real", but in this case it might best be interpreted as: do other wine assessors agree that there has been an increase in quality? There are not many of these to choose from, as most wine critics ignore Lodi wines, preferring instead to look further west when tasting California wines. However, Cellar Tracker is more reliable in this regard, since its scores come from wine drinkers rather than professional wine tasters.

So, this next graph shows the Cellar Tracker scores for exactly the same wines and vintages. I do not see any general time trend here, although a couple of the wines do increase for their most recent vintage. That is, the scores are very consistent except for one vintage of Seven Deadly Zins and one of Earthquake Petite Sirah.

Cellar Tracker scores for seven Michael David wines

So, what is going on here? Why the discrepancy between the Wine Enthusiast and Cellar Tracker? The answer appears when we look at who actually provided the Wine Enthusiast scores.

To show this, I have simply averaged both the Wine Enthusiast and Cellar Tracker scores across the available wines for each vintage, thus producing a consensus score for the winery through time. These averages are shown in the next graph, for both sources of wine scores. Note that the Cellar Tracker scores show the Michael David Winery to be a consistent 88-point winery, with only a slight increase in score through time.

Average scores for Michael David wines

However, the Wine Enthusiast scores are another matter entirely. At least three different reviewers have provided the scores, as shown at the bottom of the graph; and it is clear that these people have very different scoring systems. This is explained in more detail in the post on How many 100-point wine-quality scales are there?

So, in this case the apparent increase in Wine Enthusiast scores through time is illusory. It just so happens that, through time, the three reviewers have increasing tendencies to give high wine scores, while the people using Cellar Tracker do not. The quality of the Michael David wines is not in doubt, but what scores their wines should be given is clearly not yet settled.

This is a general issue that we should be wary of when studying time trends. We first need to ensure that the data are comparable between the different years, before we start comparing those years.

Thanks to Bob Henry for first bringing the Michael David wines to my attention. Incidentally, my favorite of these wines is the Petite Petit, a ripe-fruited blend of petite sirah and petit verdot, which is good value for money.

Monday, 24 July 2017

Wine tastings: the winning wine is often the least-worst wine

At organized wine tastings, the participants often finish by putting the wines in some sort of consensus quality order, from the wine most-preferred by the tasting group to the least-preferred. This is especially true of wine competitions, of course, but trade and home tastings are often organized this way, as well.

The question is: how do we go about deciding upon a winning wine? Perhaps the simplest way is for each person to rank the wines, and then to find a consensus ranking for the group. This is not necessarily as straightforward as it might seem.


To illustrate this idea, I will look at some data involving two separate blind tastings, in late 1995, of California cabernets (including blends) from the 1992 vintage. The first tasting had 18 wines and 17 tasters, and the second had 16 wines and 16 tasters. In both cases the tasters were asked, at the end of the tasting, to put the wines in their order of preference (ie. a rank order, ties allowed).

The first tasting produced results with a clear "winner", no matter how this is defined. The first graph shows how many of the 17 tasters ranked each wine in first place (vertically) compared to how often that wine was ranked in the top three places (horizontally). Each point represents one of the 18 wines.

Results of first tasting

Clearly, 15 of the 18 wines appeared in the top 3 ranks at least once, so that only 3 of the wines did not particularly impress anybody. Moreover, 6 of the wines got ranked in first place by at least one of the tasters — that is, one-third of the wines stood out to at least someone. However, by consensus, one of the wines (from Screaming Eagle, as it turns out) stood out head and shoulders above the others, and can be declared the "winner".

However, this situation might be quite rare. Indeed, the second tasting seems to be more typical. The next graph shows how many of the 16 tasters ranked each wine in first place (vertically) compared to how often that wine was ranked in the top five places (horizontally). Each point represents one of the 16 wines.

Results of second tasting

In this case, the tasters' preferences are more evenly spread among the wines. For example, every wine was ranked in the top 3 at least once, and in the top 4 at least twice, so that each of the wines was deemed worthy of recognition by at least one person. Furthermore, 10 of the 16 wines got ranked in first place by at least one of the tasters — that is, nearly two-thirds of the wines stood out to at least someone.

One of these wines, the Silver Oak (Napa Valley) cabernet, looks like it could be the winner, since it was ranked first 3 times and in the top five 7 times. However, the Flora Springs (Rutherford Reserve) wine appeared in the top five 10 times, even though it was ranked first only 2 times; so it is also a contender. Indeed, if we take all of the 16 ranks into account (not just the top few) then the latter wine is actually the "winner", and is shown in pink in the graph. Its worst ranking was tenth, so that no-one disliked it, whereas the Silver Oak wine was ranked last by 2 of the tasters.

We can conclude from this that being ranked first by a lot of people will not necessarily make a wine the top-ranked wine of the evening. "Winning" the tasting seems to be more about being the least-worst wine! That is, winning is as much about not being last for any taster as it is about being first.

This situation is not necessarily unusual. For example, on my other blog I have discussed the 10-yearly movie polls conducted by Sight & Sound magazine. In the 2012 poll Alfred Hitchock's film Vertigo was ranked top, displacing Citizen Kane for the first time in the 50-year history of the polls; and yet, 77% of critics polled did not even list this film in their personal top 10. Nevertheless, more critics (23%) did put Vertigo on their top-10 list than did so for any other film, and so this gets Vertigo the top spot overall. From these data, we cannot conclude that Vertigo is "the best movie of all time", but merely that it is chosen more often than the other films (albeit by less than one-quarter of the people). Preferences at wine tastings seem to follow this same principle.

Finally, we can compare the seven wines that were common to the two tastings discussed above. Did these wines appear in the same rank order at both tastings?

In this case, we can calculate the consensus rank for each tasting by summing the ranks from each participant, giving 3 points for first rank, 2 points for second, and 1 point for third. The result of this calculation is shown in the third graph, where each point represents one of the seven wines, and the axes indicate the ranking for the two tastings.

Comparison of the two tastings

The two groups of tasters agree on the bottom three wines in their rankings. However, they do not agree on the "winning" wine among these seven. More notably, they disagree quite strongly about the Silver Oak cabernet. In the second tasting this wine received 3 firsts and 2 thirds (from the 16 tasters), while in the first tasting it received 1 third ranking only (out of 17 people). The consensus ranking of this wine thus differs quite markedly between the tastings. This may reflect differences in the type of participants at the tastings, there being a broader range of wine expertise in the second tasting.

Monday, 17 July 2017

Yellow Tail and Casella Wines

Some weeks ago I posted a discussion of whether wine imports into the USA fit the proverbial "power law". I concluded that US wine imports in 2012, in terms of number of bottles sold, did, indeed, fit a Power Law. This included the best-selling imported wine, Yellow Tail, from Casella Wines, of Australia.

However, bottle sales are not the complete picture, since ultimately it is the dollar sales value that determines profitability. Statista reports (Sales of the leading table wine brands in the United States in 2016) that Yellow Tail US sales were worth $281 million in 2016, which ranks it at no. 5 overall, behind the domestic brands Barefoot ($662 million), Sutter Home ($358 million), Woodbridge ($333 million) and Franzia ($330 million). Moreover, in July 2016, The Drinks Business placed Yellow Tail at no. 6 in its list of the Top 10 biggest-selling wine brands in the world, based on sales in 2015.


It is interesting to evaluate just how profitable Yellow Tail has been for Casella Wines. This is a family-owned company founded in 1969 (see Casella Family Brands restructures to ensure family ownership), currently ranked fourth in Australia by total revenue but second by total wine production (see Winetitles Media). This makes the Casella family members seriously rich, and even in a "bad" year they are each paid millions of dollars by the company.

Being a registered company (ABN 96 060 745 315), the Casella Wines Pty Ltd accounts must be lodged with the Australian Securities and Investments Commission (the corporate regulator) at the end of each financial year (June 30). This next graph shows (in Australian $) the reported profit/loss for each financial year since the first US shipment of Yellow Tail in June 2001. (Note: the 2015-2016 financial report has apparently not yet been submitted.)

Casella Wines profit since launching the Yellow Tail wines

The economics of Yellow Tail rely almost entirely on the exchange rate between the Australian $ and the US $. The company is reported as being "comfortable" with the A$ trading up to US85¢, and "happy" with anything below US90¢, as the cost of making the wine (in Australia, in A$) is then more than compensated by the sales price (in US$, in the USA). When the brand was first launched, the Australian dollar was trading at around US57¢, and the wine thus made a tidy profit for the winery; and also for the distributor, Deutsch Family Wine and Spirits (see The Yellow Tail story: how two families turned Australia into America’s biggest wine brand).

However, Casella then suffered badly when the A$ began to improve in value over the next few years. The A$ reached parity with the US$ in July 2010; and this is the reason for the unprofitable years shown in the graph. The increased profit in 2010-2011 was apparently due to some successful currency hedging, rather than currency improvements.

Casella refused to change the bottle price of the Yellow Tail wines during the "bad times", stating that they did not want to risk losing their sales momentum by imposing a price hike. Instead, the company used its accumulated profits and, most importantly, re-negotiated its loans, in order to wait for a better exchange rate. They reported that every 1¢ movement in the currency equated to around $A2 million in higher sales revenue.

However, realizing the economic risks of relying on currency exchange-rates for profits, Casella embarked on a premiumization strategy in 2014. The idea is that "to be sustainable over the long term" requires a full portfolio of wines (see John Casella – newsmaker and visionary). The company has since bought a number of vineyards in premium Australian wine-making regions, mainly in South Australia, as well as acquiring some top-notch wine companies, including Peter Lehmann Wines, Brands Laira, and Morris Wines. This strategy is continuing to this day (see Bloomberg).

Finally, for those of you who might be concerned about these things, while the winery does have some vegan wines, the three Casella brothers are reported to all be keen shooters, one of them has actually owned an ammunition factory, and the winery is the largest corporate sponsor of the Sporting Shooters Association. Moreover, Marcello Casella has made a number of court appearances concerning his ammunition factory (Bronze Wing Ammunition factory to remain closed after WorkCover court win) and alleged involvement in drug running (see NSW South Coast drug kingpin Luigi Fato jailed for 18 years).

The recent embarrassment at the SuperBowl is best left undiscussed!

Monday, 10 July 2017

Napa cabernet grapes are greatly over-priced, even for Napa

There have been a number of recent comments on the Web about the increasing cost of cabernet sauvignon grapes from the Napa viticultural district (eg. Napa Cabernet prices at worryingly high levels). These comments are based on the outrageously high prices of those grapes compared to similar grapes from elsewhere in California. On the other hand, some people seem to accept these prices, based on the idea that Napa is the premier cabernet region in the USA.

However, it is easy to show that the Napa cabernet grape prices are way out of line even given Napa's reputation for high-quality cabernet wines.


The data I will use to show this come from the AAWE Facebook page: Average price of cabernet sauvignon grapes in California 2016. This shows the prices from last year's Grape Crush Report for each of 17 Grape Pricing Districts and Counties in California. The idea here is to use these data to derive an "expected" price for the Napa district based on the prices in the other 16 districts, so that we can compare this to the actual Napa price.

As for my previous modeling of prices (eg. The relationship of wine quality to price), the best-fitting economic model is an Exponential model, in this case relating the grape prices to the rank order of those prices. This is shown in the first graph. The graph is plotted with the logarithm of the prices, which means that the Exponential model can be represented by a straight line. Only the top five ranked districts are labeled.

Prices of California cabernet sauvignon grapes in 2016

As shown, the exponential model accounts for 98% of the variation in the rank order of the 16 grape districts, which means that this economic model fits the data extremely well. For example, if the Sonoma & Marin district really does produce better cabernet grapes than the Mendocino district, then the model indicates that their grapes are priced appropriately.

Clearly the Napa district does not fit this economic model at all. The model (based on the other 16 districts) predicts that the average price of cabernet grapes in 2016 should have been $3,409 per ton for the top ranked district. The Napa grapes, on the other hand, actually cost an average of $6,846, which is almost precisely double the expected price. This is what we mean when we say that something is "completely out of line"!

In conclusion, 16/17 districts have what appear to be fair average prices for their cabernet sauvignon grapes, given the current rank ordering of their apparent quality. Only one district is massively over-pricing itself. Even given the claim that Napa produces the highest quality cabernet wines in California, the prices of the grapes are much higher than we expect them to be. If we bought exactly these same grapes from any other grape-growing region then we would pay half as much money — the "Napa" name alone doubles the price. Something really has gotten out of hand — we are paying as much for the name as for the grapes.

Part of the issue here is the identification of prime vineyard land, for whose grapes higher prices are charged (see As the Grand Crus are identified, prices will go even higher). The obvious example in Napa is the To Kalon vineyard (see The true story of To-Kalon vineyard). Here, the Beckstoffer "pricing formula calls for the price of a ton of To Kalon Cabernet grapes to equal 100 times the current retail price of a bottle" of wine made from those grapes (The most powerful grower in Napa). This is a long-standing rule of thumb, and it explains why your average Napa cabernet tends to cost at least $70 per bottle instead of $35.

Anyway, those people who are recommending that we look to Sonoma for value-for.money cabernet wines seem to be offering good advice.

Vineyard area

While we are on the topic of California cabernets, we can also briefly look at the vineyard area of the grapes. I have noted before that concern has been expressed about the potential domination of Napa by this grape variety (see Napa versus Bordeaux red-wine prices), but here we are looking at California as a whole.

A couple of other AAWE Facebook pages provide us with the area data for the most commonly planted red (Top 25 red grape varieties in California 2015) and white (White wine grapes in California 2015) grape varieties in 2015. I have plotted these data in the next two graphs. Note that the graphs are plotted with the logarithm of both axes. Only the top four ranked varieties are labeled.

Area of red grape varieties in California in 2015
Area of white grape varieties in California in 2015

On the two graphs I have also shown a Power Law model, as explained in previous posts (eg. Do sales by US wine companies fit the proverbial "power law"?). This Power model is represented by a straight line on the log-log graphs. As shown, in both cases the model fits the data extremely well (97% and 98% of the data are fitted), but only if we exclude the three most widespread grape varieties. Note, incidentally, that there is slightly more chardonnay state-wide than there is cabernet sauvignon.

The model thus implies that there is a practical limit to how much area can be devoted readily to any one grape variety — we cannot simply keep increasing the area indefinitely, as implied by the expectation from the simple Power model. The data shown suggest that this limit appears to be c. 40,000 acres, at least for red grape varieties (ie. increase in vineyard area slows once this limit is reached).

Both chardonnay and cabernet sauvignon have twice this "limit"area, which emphasizes their importance in the California grape-growing economy. However, the Power-law model indicates that we cannot yet claim that the domination by these grapes is anything unexpected.

Monday, 3 July 2017

Awarding 90 quality points instead of 89

I have written before about the over-representation, by most wine commentators, of certain wine-quality scores compared to others. For example, I have discussed this for certain wine professionals (Biases in wine quality scores) and for certain semi-professionals (Are there biases in wine quality scores from semi-professionals?); and I have discussed it for the pooled scores from many amateurs (Are there biases in community wine-quality scores?). It still remains for me to analyze some data for the pooled scores of professionals as a group. This is what I will do here.


The data I will look at is the compilation provided by Suneal Chaudhary and Jeff Siegel in their report entitled Expert Scores and Red Wine Bias: a Visual Exploration of a Large Dataset. I have discussed these data in a previous post (What's all this fuss about red versus white wine quality scores?). The data are described this way:
We obtained 14,885 white wine scores and 46,924 red wine scores dating from the 1970s that appeared in the major wine magazines. They were given to us on the condition of anonymity. The scores do not include every wine that the magazines reviewed, so the data may not be complete, and the data was not originally collected with any goal of being a representative sample.
This is as big a compilation of wine scores as is readily available, and presumably represents a wide range of professional wine commentators. It is likely to represent widespread patterns of wine-quality scores among the critics, even today.

In my previous analyses, and those of Alex Hunt, who has also commented on this (What's in a number? Part the second), the most obvious and widespread bias when assigning quality scores on a 100-point scale is the over-representation of the score 90 and under-representation of 89. That is, the critics are more likely to award 90 than 89, when given a choice between the two scores. A similar thing often happens for the score 100 versus 99. In an unbiased world, some of the "90" scores should actually have been 89, and some of the "100" scores should actually have been 99. However, assigning wine-quality scores is not an unbiased procedure — wine assessors often have subconscious biases about what scores to assign.

It would be interesting to estimate just how many scores are involved, as this would quantify the magnitude of these two biases. Since we have at hand a dataset that represents a wide range of commentators, analyzing this particular set would tell us about general biases, not just those specific to each individual commentator.

Estimating the biases

As in my earlier posts, the analysis involves frequency distributions. The first two graphs show the quality-score data for the red wines and the white wines, arranged as two frequency distributions. The height of each vertical bar in the graphs represents the proportion of wines receiving the score indicated.

Frequency histogram of red wine scores

Frequency histogram of white wine scores

The biases involving 90 versus 89 are clear in both graphs; and the bias involving 100 is clear in the graph for the red wines (we all know that white wines usually do not get scores as high as for red wines — see What's all this fuss about red versus white wine quality scores?).

For these data, the "expectation" is that, in an unbiased world, the quality scores would show a relatively smooth frequency distribution, rather than having dips and spikes in the frequency at certain score values (such as 90 or 100). Mathematically, the expected scores would come from an "expected frequency distribution", also known as a probability distribution (see Wikipedia).

In my earlier post (Biases in wine quality scores), I used a Weibull distribution (see Wikipedia) as being a suitable probability distribution for wine-score data. In that post I also described how to use this as an expectation to estimate the degree of bias in our red- and white-wine frequency distributions.

The resulting frequency distributions are shown in the next two graphs. In these graphs, the blue bars represent the (possibly biased) scores from the critics, and the maroon bars are the unbiased expectations (from the model). Note that the mathematical expectations both form nice smooth distributions, with no dips or spikes. Those quality scores where the heights of the paired bars differ greatly are the ones where bias is indicated.

Frequency histogram of modeled red wine scores

Frequency histogram of modeled white wine scores

We can now estimate the degree of bias by comparing the observed scores to their expectations. For the red wines, a score of "90" occurs 1.53 times more often than expected, and for the white wines it is 1.44 times. So, we can now say that there is a consistent bias among the critics, whereby a score of "90" occurs c.50% more often than it should. This is not a small bias!

For a score of "100" we can only refer to the red-wine data. These data indicate that this score occurs more than 8 times as often as expected from the model. This is what people are referring to when they talk about "score inflation" — the increasing presence of 100-point scores. It might therefore be an interesting future analysis to see whether we can estimate any change in 100-point bias through recent time, and thereby quantify this phenomenon.

Finally, having produced unbiased expectations for the  red and white wines, we can now compare their average scores. These are c.91.7 and c.90.3 for the reds and whites, respectively. That is, on average, red wines get 1⅓ more points than do the whites. This is much less of a difference than has been claimed by some wine commentators.

Conclusion

Personal wine-score biases are easy to demonstrate for individual commentators, whether professional or semi-professional. We now know that there are also general biases shared among commentators, whether they are professional or amateur. The most obvious of these is a preference for over-using a score of 90 points, instead of 89 points. I have shown here that one in every three 90-point wines from the professional critics is actually an 89-point wine with an inflated score. Moreover, the majority of the 100-point wines of the world are actually 99-point wines that are receiving a bit of emotional support from the critics.