Monday, July 31, 2017

Do winery quality scores improve through time?

The Michael David Winery is perhaps the best-known wine producer in Lodi, California. It has recently been in the news, as it was named the 2015 "Winery of the Year" by the Unified Wine & Grape Symposium; and its Seven Deadly Zins old-vine zinfandel topped Shanken’s Impact newsletter's “Hot Brand” award list in both 2015 and 2016. Not bad for a Central Valley winery producing 500,000 cases per year, with more than two dozen different wine labels.

The Wine Industry Advisor notes for the Seven Deadly Zins (a blend of grapes from seven Lodi growers) that: "As sales grew so did the scores, as it achieved 90+ ratings from both Robert Parker and Wine Enthusiast Magazine over the past three vintages." Indeed, the first graph shows the Wine Enthusiast scores for eight vintages of seven different wines from the Michael David Winery. With two exceptions (in 2008), the scores do seem to increase through the vintages.

Wine Enthusiast scores for seven Michael David wines

Is this pattern real? Obviously, it depends on what we mean by "real", but in this case it might best be interpreted as: do other wine assessors agree that there has been an increase in quality? There are not many of these to choose from, as most wine critics ignore Lodi wines, preferring instead to look further west when tasting California wines. However, Cellar Tracker is more reliable in this regard, since its scores come from wine drinkers rather than professional wine tasters.

So, this next graph shows the Cellar Tracker scores for exactly the same wines and vintages. I do not see any general time trend here, although a couple of the wines do increase for their most recent vintage. That is, the scores are very consistent except for one vintage of Seven Deadly Zins and one of Earthquake Petite Sirah.

Cellar Tracker scores for seven Michael David wines

So, what is going on here? Why the discrepancy between the Wine Enthusiast and Cellar Tracker? The answer appears when we look at who actually provided the Wine Enthusiast scores.

To show this, I have simply averaged both the Wine Enthusiast and Cellar Tracker scores across the available wines for each vintage, thus producing a consensus score for the winery through time. These averages are shown in the next graph, for both sources of wine scores. Note that the Cellar Tracker scores show the Michael David Winery to be a consistent 88-point winery, with only a slight increase in score through time.

Average scores for Michael David wines

However, the Wine Enthusiast scores are another matter entirely. At least three different reviewers have provided the scores, as shown at the bottom of the graph; and it is clear that these people have very different scoring systems. This is explained in more detail in the post on How many 100-point wine-quality scales are there?

So, in this case the apparent increase in Wine Enthusiast scores through time is illusory. It just so happens that, through time, the three reviewers have increasing tendencies to give high wine scores, while the people using Cellar Tracker do not. The quality of the Michael David wines is not in doubt, but what scores their wines should be given is clearly not yet settled.

This is a general issue that we should be wary of when studying time trends. We first need to ensure that the data are comparable between the different years, before we start comparing those years.

Thanks to Bob Henry for first bringing the Michael David wines to my attention. Incidentally, my favorite of these wines is the Petite Petit, a ripe-fruited blend of petite sirah and petit verdot, which is good value for money.

Monday, July 24, 2017

Wine tastings: the winning wine is often the least-worst wine

At organized wine tastings, the participants often finish by putting the wines in some sort of consensus quality order, from the wine most-preferred by the tasting group to the least-preferred. This is especially true of wine competitions, of course, but trade and home tastings are often organized this way, as well.

The question is: how do we go about deciding upon a winning wine? Perhaps the simplest way is for each person to rank the wines, and then to find a consensus ranking for the group. This is not necessarily as straightforward as it might seem.

To illustrate this idea, I will look at some data involving two separate blind tastings, in late 1995, of California cabernets (including blends) from the 1992 vintage. The first tasting had 18 wines and 17 tasters, and the second had 16 wines and 16 tasters. In both cases the tasters were asked, at the end of the tasting, to put the wines in their order of preference (ie. a rank order, ties allowed).

The first tasting produced results with a clear "winner", no matter how this is defined. The first graph shows how many of the 17 tasters ranked each wine in first place (vertically) compared to how often that wine was ranked in the top three places (horizontally). Each point represents one of the 18 wines.

Results of first tasting

Clearly, 15 of the 18 wines appeared in the top 3 ranks at least once, so that only 3 of the wines did not particularly impress anybody. Moreover, 6 of the wines got ranked in first place by at least one of the tasters — that is, one-third of the wines stood out to at least someone. However, by consensus, one of the wines (from Screaming Eagle, as it turns out) stood out head and shoulders above the others, and can be declared the "winner".

However, this situation might be quite rare. Indeed, the second tasting seems to be more typical. The next graph shows how many of the 16 tasters ranked each wine in first place (vertically) compared to how often that wine was ranked in the top five places (horizontally). Each point represents one of the 16 wines.

Results of second tasting

In this case, the tasters' preferences are more evenly spread among the wines. For example, every wine was ranked in the top 3 at least once, and in the top 4 at least twice, so that each of the wines was deemed worthy of recognition by at least one person. Furthermore, 10 of the 16 wines got ranked in first place by at least one of the tasters — that is, nearly two-thirds of the wines stood out to at least someone.

One of these wines, the Silver Oak (Napa Valley) cabernet, looks like it could be the winner, since it was ranked first 3 times and in the top five 7 times. However, the Flora Springs (Rutherford Reserve) wine appeared in the top five 10 times, even though it was ranked first only 2 times; so it is also a contender. Indeed, if we take all of the 16 ranks into account (not just the top few) then the latter wine is actually the "winner", and is shown in pink in the graph. Its worst ranking was tenth, so that no-one disliked it, whereas the Silver Oak wine was ranked last by 2 of the tasters.

We can conclude from this that being ranked first by a lot of people will not necessarily make a wine the top-ranked wine of the evening. "Winning" the tasting seems to be more about being the least-worst wine! That is, winning is as much about not being last for any taster as it is about being first.

This situation is not necessarily unusual. For example, on my other blog I have discussed the 10-yearly movie polls conducted by Sight & Sound magazine. In the 2012 poll Alfred Hitchock's film Vertigo was ranked top, displacing Citizen Kane for the first time in the 50-year history of the polls; and yet, 77% of critics polled did not even list this film in their personal top 10. Nevertheless, more critics (23%) did put Vertigo on their top-10 list than did so for any other film, and so this gets Vertigo the top spot overall. From these data, we cannot conclude that Vertigo is "the best movie of all time", but merely that it is chosen more often than the other films (albeit by less than one-quarter of the people). Preferences at wine tastings seem to follow this same principle.

Finally, we can compare the seven wines that were common to the two tastings discussed above. Did these wines appear in the same rank order at both tastings?

In this case, we can calculate the consensus rank for each tasting by summing the ranks from each participant, giving 3 points for first rank, 2 points for second, and 1 point for third. The result of this calculation is shown in the third graph, where each point represents one of the seven wines, and the axes indicate the ranking for the two tastings.

Comparison of the two tastings

The two groups of tasters agree on the bottom three wines in their rankings. However, they do not agree on the "winning" wine among these seven. More notably, they disagree quite strongly about the Silver Oak cabernet. In the second tasting this wine received 3 firsts and 2 thirds (from the 16 tasters), while in the first tasting it received 1 third ranking only (out of 17 people). The consensus ranking of this wine thus differs quite markedly between the tastings. This may reflect differences in the type of participants at the tastings, there being a broader range of wine expertise in the second tasting.

Monday, July 17, 2017

Yellow Tail and Casella Wines

Some weeks ago I posted a discussion of whether wine imports into the USA fit the proverbial "power law". I concluded that US wine imports in 2012, in terms of number of bottles sold, did, indeed, fit a Power Law. This included the best-selling imported wine, Yellow Tail, from Casella Wines, of Australia.

However, bottle sales are not the complete picture, since ultimately it is the dollar sales value that determines profitability. Statista reports (Sales of the leading table wine brands in the United States in 2016) that Yellow Tail US sales were worth $281 million in 2016, which ranks it at no. 5 overall, behind the domestic brands Barefoot ($662 million), Sutter Home ($358 million), Woodbridge ($333 million) and Franzia ($330 million). Moreover, in July 2016, The Drinks Business placed Yellow Tail at no. 6 in its list of the Top 10 biggest-selling wine brands in the world, based on sales in 2015.

It is interesting to evaluate just how profitable Yellow Tail has been for Casella Wines. This is a family-owned company founded in 1969 (see Casella Family Brands restructures to ensure family ownership), currently ranked fourth in Australia by total revenue but second by total wine production (see Winetitles Media). This makes the Casella family members seriously rich, and even in a "bad" year they are each paid millions of dollars by the company.

Being a registered company (ABN 96 060 745 315), the Casella Wines Pty Ltd accounts must be lodged with the Australian Securities and Investments Commission (the corporate regulator) at the end of each financial year (June 30). This next graph shows (in Australian $) the reported profit/loss for each financial year since the first US shipment of Yellow Tail in June 2001. (Note: the 2015-2016 financial report has apparently not yet been submitted.)

Casella Wines profit since launching the Yellow Tail wines

The economics of Yellow Tail rely almost entirely on the exchange rate between the Australian $ and the US $. The company is reported as being "comfortable" with the A$ trading up to US85¢, and "happy" with anything below US90¢, as the cost of making the wine (in Australia, in A$) is then more than compensated by the sales price (in US$, in the USA). When the brand was first launched, the Australian dollar was trading at around US57¢, and the wine thus made a tidy profit for the winery; and also for the distributor, Deutsch Family Wine and Spirits (see The Yellow Tail story: how two families turned Australia into America’s biggest wine brand).

However, Casella then suffered badly when the A$ began to improve in value over the next few years. The A$ reached parity with the US$ in July 2010; and this is the reason for the unprofitable years shown in the graph. The increased profit in 2010-2011 was apparently due to some successful currency hedging, rather than currency improvements.

Casella refused to change the bottle price of the Yellow Tail wines during the "bad times", stating that they did not want to risk losing their sales momentum by imposing a price hike. Instead, the company used its accumulated profits and, most importantly, re-negotiated its loans, in order to wait for a better exchange rate. They reported that every 1¢ movement in the currency equated to around $A2 million in higher sales revenue.

However, realizing the economic risks of relying on currency exchange-rates for profits, Casella embarked on a premiumization strategy in 2014. The idea is that "to be sustainable over the long term" requires a full portfolio of wines (see John Casella – newsmaker and visionary). The company has since bought a number of vineyards in premium Australian wine-making regions, mainly in South Australia, as well as acquiring some top-notch wine companies, including Peter Lehmann Wines, Brands Laira, and Morris Wines. This strategy is continuing to this day (see Bloomberg).

Finally, for those of you who might be concerned about these things, while the winery does have some vegan wines, the three Casella brothers are reported to all be keen shooters, one of them has actually owned an ammunition factory, and the winery is the largest corporate sponsor of the Sporting Shooters Association. Moreover, Marcello Casella has made a number of court appearances concerning his ammunition factory (Bronze Wing Ammunition factory to remain closed after WorkCover court win) and alleged involvement in drugs (see NSW South Coast drug kingpin Luigi Fato jailed for 18 years), to which he recently pleaded guilty (Wine kingpin pleads guilty to concealing thousands of marijuana plants).

Monday, July 10, 2017

Napa cabernet grapes are greatly over-priced, even for Napa

There have been a number of recent comments on the Web about the increasing cost of cabernet sauvignon grapes from the Napa viticultural district (eg. Napa Cabernet prices at worryingly high levels). These comments are based on the outrageously high prices of those grapes compared to similar grapes from elsewhere in California. On the other hand, some people seem to accept these prices, based on the idea that Napa is the premier cabernet region in the USA.

However, it is easy to show that the Napa cabernet grape prices are way out of line even given Napa's reputation for high-quality cabernet wines.

The data I will use to show this come from the AAWE Facebook page: Average price of cabernet sauvignon grapes in California 2016. This shows the prices from last year's Grape Crush Report for each of 17 Grape Pricing Districts and Counties in California. The idea here is to use these data to derive an "expected" price for the Napa district based on the prices in the other 16 districts, so that we can compare this to the actual Napa price.

As for my previous modeling of prices (eg. The relationship of wine quality to price), the best-fitting economic model is an Exponential model, in this case relating the grape prices to the rank order of those prices. This is shown in the first graph. The graph is plotted with the logarithm of the prices, which means that the Exponential model can be represented by a straight line. Only the top five ranked districts are labeled.

Prices of California cabernet sauvignon grapes in 2016

As shown, the exponential model accounts for 98% of the variation in the rank order of the 16 grape districts, which means that this economic model fits the data extremely well. For example, if the Sonoma & Marin district really does produce better cabernet grapes than the Mendocino district, then the model indicates that their grapes are priced appropriately.

Clearly the Napa district does not fit this economic model at all. The model (based on the other 16 districts) predicts that the average price of cabernet grapes in 2016 should have been $3,409 per ton for the top ranked district. The Napa grapes, on the other hand, actually cost an average of $6,846, which is almost precisely double the expected price. This is what we mean when we say that something is "completely out of line"!

In conclusion, 16/17 districts have what appear to be fair average prices for their cabernet sauvignon grapes, given the current rank ordering of their apparent quality. Only one district is massively over-pricing itself. Even given the claim that Napa produces the highest quality cabernet wines in California, the prices of the grapes are much higher than we expect them to be. If we bought exactly these same grapes from any other grape-growing region then we would pay half as much money — the "Napa" name alone doubles the price. Something really has gotten out of hand — we are paying as much for the name as for the grapes.

Part of the issue here is the identification of prime vineyard land, for whose grapes higher prices are charged (see As the Grand Crus are identified, prices will go even higher). The obvious example in Napa is the To Kalon vineyard (see The true story of To-Kalon vineyard). Here, the Beckstoffer "pricing formula calls for the price of a ton of To Kalon Cabernet grapes to equal 100 times the current retail price of a bottle" of wine made from those grapes (The most powerful grower in Napa). This is a long-standing rule of thumb, and it explains why your average Napa cabernet tends to cost at least $70 per bottle instead of $35.

Anyway, those people who are recommending that we look to Sonoma for cabernet wines seem to be offering good advice.

Vineyard area

While we are on the topic of California cabernets, we can also briefly look at the vineyard area of the grapes. I have noted before that concern has been expressed about the potential domination of Napa by this grape variety (see Napa versus Bordeaux red-wine prices), but here we are looking at California as a whole.

A couple of other AAWE Facebook pages provide us with the area data for the most commonly planted red (Top 25 red grape varieties in California 2015) and white (White wine grapes in California 2015) grape varieties in 2015. I have plotted these data in the next two graphs. Note that the graphs are plotted with the logarithm of both axes. Only the top four ranked varieties are labeled.

Area of red grape varieties in California in 2015
Area of white grape varieties in California in 2015

On the two graphs I have also shown a Power Law model, as explained in previous posts (eg. Do sales by US wine companies fit the proverbial "power law"?). This Power model is represented by a straight line on the log-log graphs. As shown, in both cases the model fits the data extremely well (97% and 98% of the data are fitted), but only if we exclude the three most widespread grape varieties. Note, incidentally, that there is slightly more chardonnay state-wide than there is cabernet sauvignon.

The model thus implies that there is a practical limit to how much area can be devoted readily to any one grape variety — we cannot simply keep increasing the area indefinitely, as implied by the expectation from the simple Power model. The data shown suggest that this limit appears to be c. 40,000 acres, at least for red grape varieties (ie. increase in vineyard area slows once this limit is reached).

Both chardonnay and cabernet sauvignon have twice this "limit"area, which emphasizes their importance in the California grape-growing economy. However, the Power-law model indicates that we cannot yet claim that the domination by these grapes is anything unexpected.

Olena Sambucci and Julian M. Alston (2017. Estimating the value of California wine grapes. Journal of Wine Economics 12: 149-160) have subsequently pointed that the price estimates are likely to be underestimates, as they apply only to grapes sold (not to grapes crushed by the grower). This would probably increase the average prices of the grapes in the more expensive areas, because the retained grapes are more likely to be of better quality than the sold grapes..

Monday, July 3, 2017

Awarding 90 quality points instead of 89

I have written before about the over-representation, by most wine commentators, of certain wine-quality scores compared to others. For example, I have discussed this for certain wine professionals (Biases in wine quality scores) and for certain semi-professionals (Are there biases in wine quality scores from semi-professionals?); and I have discussed it for the pooled scores from many amateurs (Are there biases in community wine-quality scores?). It still remains for me to analyze some data for the pooled scores of professionals as a group. This is what I will do here.

The data I will look at is the compilation provided by Suneal Chaudhary and Jeff Siegel in their report entitled Expert Scores and Red Wine Bias: a Visual Exploration of a Large Dataset. I have discussed these data in a previous post (What's all this fuss about red versus white wine quality scores?). The data are described this way:
We obtained 14,885 white wine scores and 46,924 red wine scores dating from the 1970s that appeared in the major wine magazines. They were given to us on the condition of anonymity. The scores do not include every wine that the magazines reviewed, so the data may not be complete, and the data was not originally collected with any goal of being a representative sample.
This is as big a compilation of wine scores as is readily available, and presumably represents a wide range of professional wine commentators. It is likely to represent widespread patterns of wine-quality scores among the critics, even today.

In my previous analyses, and those of Alex Hunt, who has also commented on this (What's in a number? Part the second), the most obvious and widespread bias when assigning quality scores on a 100-point scale is the over-representation of the score 90 and under-representation of 89. That is, the critics are more likely to award 90 than 89, when given a choice between the two scores. A similar thing often happens for the score 100 versus 99. In an unbiased world, some of the "90" scores should actually have been 89, and some of the "100" scores should actually have been 99. However, assigning wine-quality scores is not an unbiased procedure — wine assessors often have subconscious biases about what scores to assign.

It would be interesting to estimate just how many scores are involved, as this would quantify the magnitude of these two biases. Since we have at hand a dataset that represents a wide range of commentators, analyzing this particular set would tell us about general biases, not just those specific to each individual commentator.

Estimating the biases

As in my earlier posts, the analysis involves frequency distributions. The first two graphs show the quality-score data for the red wines and the white wines, arranged as two frequency distributions. The height of each vertical bar in the graphs represents the proportion of wines receiving the score indicated.

Frequency histogram of red wine scores

Frequency histogram of white wine scores

The biases involving 90 versus 89 are clear in both graphs; and the bias involving 100 is clear in the graph for the red wines (we all know that white wines usually do not get scores as high as for red wines — see What's all this fuss about red versus white wine quality scores?).

For these data, the "expectation" is that, in an unbiased world, the quality scores would show a relatively smooth frequency distribution, rather than having dips and spikes in the frequency at certain score values (such as 90 or 100). Mathematically, the expected scores would come from an "expected frequency distribution", also known as a probability distribution (see Wikipedia).

In my earlier post (Biases in wine quality scores), I used a Weibull distribution (see Wikipedia) as being a suitable probability distribution for wine-score data. In that post I also described how to use this as an expectation to estimate the degree of bias in our red- and white-wine frequency distributions.

The resulting frequency distributions are shown in the next two graphs. In these graphs, the blue bars represent the (possibly biased) scores from the critics, and the maroon bars are the unbiased expectations (from the model). Note that the mathematical expectations both form nice smooth distributions, with no dips or spikes. Those quality scores where the heights of the paired bars differ greatly are the ones where bias is indicated.

Frequency histogram of modeled red wine scores

Frequency histogram of modeled white wine scores

We can now estimate the degree of bias by comparing the observed scores to their expectations. For the red wines, a score of "90" occurs 1.53 times more often than expected, and for the white wines it is 1.44 times. So, we can now say that there is a consistent bias among the critics, whereby a score of "90" occurs c.50% more often than it should. This is not a small bias!

For a score of "100" we can only refer to the red-wine data. These data indicate that this score occurs more than 8 times as often as expected from the model. This is what people are referring to when they talk about "score inflation" — the increasing presence of 100-point scores. It might therefore be an interesting future analysis to see whether we can estimate any change in 100-point bias through recent time, and thereby quantify this phenomenon.

Finally, having produced unbiased expectations for the  red and white wines, we can now compare their average scores. These are c.91.7 and c.90.3 for the reds and whites, respectively. That is, on average, red wines get 1⅓ more points than do the whites. This is much less of a difference than has been claimed by some wine commentators.


Personal wine-score biases are easy to demonstrate for individual commentators, whether professional or semi-professional. We now know that there are also general biases shared among commentators, whether they are professional or amateur. The most obvious of these is a preference for over-using a score of 90 points, instead of 89 points. I have shown here that one in every three 90-point wines from the professional critics is actually an 89-point wine with an inflated score. Moreover, the majority of the 100-point wines of the world are actually 99-point wines that are receiving a bit of emotional support from the critics.