The Wine Gourd: July 2019

Monday, July 29, 2019

Quantifying the Parkerization of the wine world

“Parkerization” involves two features of the wine world: a shift to the production of wines with which the consumer arm-wrestles rather than pairs with food¹; and using numbers to describe wine quality. Here, I will look at the latter characteristic.

It has been pointed out before today that awarding quality points to wines may well be a straightforward way to express a critical opinion about those wines, but it does nothing to inspire people about the wines themselves.² Indeed, points seem to be more useful for wine collectors, investors and status seekers than for wine drinkers — high-scoring wines increase in value through time and can be endlessly talked about.³

Therefore, there has clearly been a feed-back loop between the scores and the investors, where the points have crept upwards in response to collectors investing only in high-scoring wines. No-where has this issue been more obvious than in the scores from the wine-advice newsletter The Wine Advocate (started in 1978), particularly those scores coming directly from the man who invented the 100-point scale: Robert M. Parker Jr.⁴

This issue was briefly discussed by Blake Gray back in 2013 (Grade inflation at a glance: a look at Robert Parker's 1987 Wine Buyer's Guide). He quoted some of the scores and comments from the very first edition of The Wine Buyer's Guide (as it was then called), published in 1987; and he emphasized that the scores are much lower than they became in later editions of that book.

I thought that it might be interesting to do a more thorough job of this comparison, if only for the record. So, let’s compare “Early Parker” with “Late Parker”.

One part of the comparison can be taken from Alex Hunt’s 2013 article What's in a number? Part the Second (on JancisRobinson.com) There, he provided a frequency histogram of 43,094 of Parker’s wine-quality scores from the online edition of The Wine Advocate, spanning the previous four years. We can use these data to represent the scoring paradigm of “Late Parker” wine assessment.

Here is that histogram. The quality scores are along the horizontal axis, with the vertical axis showing us what percentage each score is of the total number of scores. I have discussed this graph before (Biases in wine quality scores), and noted two biases: a score of 89 is less than we might expect while a score of 90 is greater; and similarly for scores of 99 and 100.

It now remains to provide a comparable sample of the scoring paradigm of “Early Parker” wine assessment. I have done this by going through the first (1987) edition of Parker's The Wine Buyer's Guide, manually transcribing the 3,310 scores recorded therein.⁵ I then produced my own frequency histogram of these data.

Here is the histogram. In this case, the vertical axis shows the counts for each score, rather than the percentage, while the horizontal axis goes all the way from 50–100. Comparing the two graphs, it is clear that “Parkerization” has involved a shift from an average quality score of 83–84 points to one of 89–90. That 6-point shift is hardly insignificant.

In this case we cannot necessarily determine cause and effect. Is the shift caused by allocating higher scores to the later wines? Or is it caused by progressively selecting better wines through time, while ignoring lower-scoring wines? Or did the winemakers progressively make wines that better suited Parker's palate? In truth, it is probably all three.

What is obvious, though, is that Parker’s original plan was abandoned somewhere along the line. In a 1989 interview with the Wine Times he noted:

Mine from the very beginning is a 50-point system. If you start at 50 and go to 100, it is clear it's a 50-point system, and it has always been clear. Mine is basically two 20-point systems with a 10-point cushion on top for wines that have the ability to age.

The second 20-point system (70–90 points) can be seen in both histograms, but the first 20-point system (50–70 points) is completely absent from “Late Parker”. Actually, the first histogram shows that “Late Parker” ended up as a single 20-point system (80–100 points).

Moving on, we can note quite a few other features of the “Early Parker” scores.

First, Parker described his 1987 book’s scoring system with these words:

50-69	is a D [grade]; it is a sign of an imbalanced, flawed, terribly dull or diluted wine
70-79	represents a C [grade], or average mark
80-89	is equivalent to a B [grade] and such a wine, particularly in the 85–89 range, is very, very good
90-100	is equivalent to an A [grade] and is given for an outstanding or excellent special effort.

In the book, fully 74% of the wines get a C grade, with only 11% of them getting an A grade, and 2% getting a D. Indeed, one-quarter (26%) of the wines score either 84 or 85 points, and 60% are in the range 82–87.

Second, there is a distinct preference for even numbers, except for 55, 65 and 75. Indeed, the small values are mainly: 55, 60, 65, 70, 72, 75, 78 and 80. Put the other way around, some numbers are not used at all (51, 53, 54, 57, 61, 63, 64, 66) and some are hardly used (51, 61, 71, 81, 91). Clearly, there is no pretense of any precision in the scores for C and D grade wines.

Third, the preference for a score of 90 over 89 is even more marked in the “Early Parker” graph than for “Late Parker”. Indeed, there are 2.8 times as many scores of 90 as there are of 89! On the other hand, the excess of 100 over 99 is similar in both graphs.

Finally, we could also look at the scores for the different wine-making regions. The book provides scores for 12 different regions, 6 of them in France.

The final graph shows each of the 12 frequency histograms summarized as a separate box-and-whisker plot. The scores for each plot are shown horizontally, as expected, with several characteristic summaries illustrated above them. The boxed area shows the range for the middle 50% of the scores, with the vertical center-line indicating the median (50% of the scores are above the median and 50% below). The horizontal line (whisker) on each side of the box indicates the range of most of the rest of the scores. However, unusual (outlying) values are shown by individual symbols.

Comparing these 12 plots tells us several notable things. For example, most of the low scores come from the Rhône and California wines. At the other extreme, most of the top scores come from the Bordeaux wines (no surprise there!). The Provence wines all get pretty much the same score, as do the wines from Tuscany. On the other hand, the wines from Champagne and Spain cover the widest ranges of points. The Port wines do best on average, although this presumably comes from tasting only vintage wines, which have already been selected as best by the producers.

Conclusion

The “Parkerization” of the wine scores through time is definitely a real phenomenon (c. 6 points), not just a figment of people’s imagination (or anti-Parker sentiment).

¹ David Shaw in 1987 (Wine writers: squeezing the grape for news): “Frank Prial of the New York Times is almost universally regarded as the best wine writer on any American newspaper. Indeed, many wine makers say Prial helped alter the course of California wine making when he wrote in 1981 that most California wines were "too aggressive, too alcoholic ... clumsy, overpowering" — too big and heavy to properly complement food.” A few years later, Robert Parker reversed this trend towards food wines.

² David Shaw in 1987 (Wine critics influence of writers can be heady): “… others call Parker's 100-point scale a "gimmick" that inflates customers' expectations, exploits the insecure consumer’s desire for a simple buying guide and, worse, reduces the subjective, sensual experience of drinking wine to an objective, numerical standard far more rigid than any palate, even Parker’s, can justify.”

Elin McCoy in 2005 (The Emperor of Wine: The Rise of Robert Parker, Jr. and the Reign of American Taste): “I find scoring wine with numbers a joke in scientific terms and misleading in thinking about either the quality or pleasure of wine, something that turns wine into a contest instead of an experience.”

Eric Asimov in 2019 (It’s time to rethink wine criticism): “The post-Parker era actually began a decade ago, as more critical voices and points of view began to be heard and heeded. It’s time to re-examine the nature of American wine criticism today, a methodology that Mr. Parker helped both to popularize and to institutionalize. And it’s time to consider a better model that might be more useful to consumers, a system that would empower them to make their own choices rather than tether them endlessly to critics’ bottle-by-bottle reviews.”

³ David Shaw in 1999 (He sips and spits — and the world listens): “[Parker’s] detractors say he's played a significant role in skyrocketing wine prices and in what they see as the homogenization of many of the world’s wines into a single dense, overly concentrated "international style." These wines, they say, lack elegance and finesse, don’t age well and sacrifice the individual and indigenous character of many vineyards and winemakers.”

⁴ Parker from a 1989 interview with the Wine Times: “The newsletter was always meant to be a guide, one person’s opinion. The scoring system was always meant to be an accessory to the written reviews, tasting notes. That’s why I use sentences and try and make it interesting. Reading is a lost skill in America. There’s a certain segment of my readers who only look at numbers, but I think it is a much smaller segment than most wine writers would like to believe. The tasting notes are one thing, but in order to communicate effectively and quickly where a wine placed vis-à-vis its peer group, a numerical scale was necessary. If I didn't do that, it would have been a sort of cop-out.”

⁵ I did this in bits and pieces over a week. I cannot forget getting Repetitive Strain Injury back in the mid 1980s, when I spent months manually entering my field data (tens of thousands of numbers) into a computer for most of each weekday.

Monday, July 22, 2019

Can we find out whether we really can tell red wine from white?

In an article in the Los Angeles Times way back in 1987, David Shaw noted (He sips and spits — and the world listens) this about Robert Parker:

More than once he [has been] asked if he’d be willing to demonstrate his consistency. Would he taste and score five or six wines “blind” — without knowing what they are — and then taste and score them again a day or two later? “No,” he says. “I'm not doing trained dog tricks. I’ve got everything to lose and nothing to gain.”

It seems that Mr Parker neither respects scientific experiments nor understands their use — experiments are not dog tricks. Sadly, though, this is often the response of experts, who apparently see their authority as being challenged by any objective assessment of their (alleged) expertise.

A more general case of this in the wine industry is the ability to recognize whether a wine tasted blind is red or white. It has sometimes been claimed that most people cannot tell the difference, when the wines are tasted under identical conditions.

This topic is worthy of some discussion, because it seems to me (as a professional scientist) that: (i) it has never actually been shown that people can not tell; and (ii) testing this claim scientifically would actually be rather hard.

Potential problems

Most of the so-called experiments reported in the general media strike me as being rather inadequate. The main issue is comparability. This is a basic requirement of experiments, that the things being compared are directly comparable. If the things being compared differ in several ways, then how can you know which one is causing the main effect?

In order to see the issue at hand, consider this point:

White wines are usually consumed at a lower temperature than red wines. If you compare a red and a white at the same temperature, then one or the other wine is being tasted sub-optimally. Your ability to correctly identify a refreshing white when chilled does not help you recognize that same wine at room temperature, where it may be rather bland.¹ Similarly, that red wine that is so redolent of forest fruits at room temperature may be completely closed if you chill it. How could we make the comparison of reds and whites directly comparable?

Similarly, experience plays a great role in being able to make consistent comparisons. If you have no experience of the comparison, then you may not know what difference you are being asked to detect. For example:

If you are not used to drinking expensive wines (or to having really cheap ones!), how do you recognize their differences? If you have experienced a wide range of wine prices, then you may have a chance, but if your wines usually come out of a cardboard box, then that $100 bottle you were given as a present may seem “different”, but you might not necessarily say that it is “obviously expensive”.²

If you are not familiar with certain wine styles, then how do you recognize their different characteristics? A Barossa Shiraz (from South Australia) is quite different from an Etna Rosso (from Sicily), but if you have never had one or the other, then how can you correctly tell which is which?

This should make it clear that experimental comparisons are tricky things. Most so-called experiments conducted by non-experts therefore usually have holes in them that you could drive a car through. This comes quite simply from not trying to make the comparisons directly comparable, in a way that would answer the experimental question.

Previous reports about wine

Having set the scene, we can look at the media reports.

Far and away the most popularly mentioned work allegedly relating to red wine versus white is reported endlessly by web sites wanting to debunk wine tasting, but also by some of the reputedly more responsible press (eg. You are not so smart: why we can't tell good wine from bad ; Does all wine taste the same?). The work was done by Frédéric Brochet in 2001. He is a French cognitive psychologist studying what he calls “perceptive expectation”. Brochet showed that many people given a white wine that has been dyed red will describe it as they would a red wine; and that the same wine served in two different bottles of supposedly different quality wines will also be assessed differently.

However, none of this has anything to do with telling red wine from white, as explained clearly at the Urban Legends of Science blog (About that wine experiment) — all it does is show that what we see is important for our brain. The visual clue is, indeed, important for wine, which we already know — from professional wine descriptions, if nothing else, but it is also the basis of having black (opaque) tasting glasses. But what has that got to do with telling red from white when are trying to do so? Moreover, there is not enough information in the reports to tell how comparable the comparisons were.

You can read more about the relationship of taste, smell and sight at Slate — Do you taste what I taste? (You might find interesting the discussion of butyric acid, which is in perspiration, vomit and parmesan cheese.³) See also: The taste of wine isn't all in your head, but your brain sure helps and The red and the white — taste is partly expectation. Scientists are, of course, working on this — see: How expectation influences perception.

Moving on to actual blind-tasting comparisons of red wines and white wines, Thomas Matthews at the Wine Spectator reported in 2002 on his own experiment (Can you tell red from white?). There were 6 wines, either red or white, and 7 people tasting, so that there were 42 attempts to “guess right”. He reports 40 correct results (95%). This hardly debunks the idea of telling red from whites!

There is one other report that seems worthwhile, from Erik Rasmussen at his American in Spain blog (Blindfolded wine taste test: can you distinguish white from red? Cheap from expensive?). He chose four wines of different types, as listed in the table below, and four people tasted each wine at room temperature, while blindfolded. Here, the comparison has been taken seriously, because the wines all came from the same region (Rioja), the tasters had reasonable expertise, there were equal sexes, etc.

The table shows the results for the four wines (the rows), the four choices (the columns), and the four tasters (the cell counts). The boxes show the correct guesses (7 / 16 = 44% correct) — because each taster had four choices, there is only a 25% probability of getting a wine right by random chance, and 4% for getting all four right.

In spite of the small experimental sample, the results here are statistically significant (using the Fisher Exact Test). For example, no-one identified the white as a red, and the expensive red was always identified as a red. So, these people, at least, did better than guessing, with three people each getting two wines right. Most interestingly, the ones they got right were generally related to their expertise with the wine styles concerned.

Finally, we could also consider the case of blind winemakers. One such person is David Hunt, from Hunt Cellars (see A man of taste: keen sensory perception and meticulous blending help David Hunt achieve his winemaking vision). The difficulty for our purposes here is that he does know what he is tasting before he tastes it — he just can't use his eyes to see it.

Conclusion

So, we may never know whether we can tell reds from whites, although the work done so far suggests that we can do much better than a random guess. The next time you read a claim that we cannot tell, then you will know that the person is bluffing. Besides, it is quite certain that “tasting” involves our mouth, our nose and our eyes — the brain needs all three to decide what it is drinking, and whether it likes it.

Bonus notes: experimental requirements

Experiments are all about getting the comparisons right. So, what do we need to do?

There are actually three ways that you can try to make your experimental “treatments” comparable:

Incorporate all of the variation into your comparisons (eg. choose several types of wines, from several regions, and taste them at several temperatures, etc) — this leads to a very large and complicated experiment, but it will be very informative because you have studied everything
Try to control the conditions to minimize variation (eg. similar wine-making styles, and grapes, similar tasting expertise, all done at the same time) — this usually leads to a small experiment, and the results cannot be generalized beyond the specific conditions that you studied
Randomize the variation (eg. choose lots of arbitrary wines and temperatures) — you end up with lots of variation in the results, and you thus need a very large experiment.

These approaches are not mutually exclusive, and they may all appear in the same experiment. For example, you might incorporate a few types of variation (eg. temperature and wine style), while you control others (eg. all tasters are experts).

What sorts of variation might we be concerned about when comparing white and red wines? Here are some possibilities:

grape type — green skins all the way through to dark red skins
skin contact — for most grapes the color is in the skin, so maceration time is important
tannin level — some wines are naturally full of tannins, while others receive oak treatment
fruitiness — some wines are prized for their fruitiness, others for their savoriness
sweetness — residual sweetness is often preferred by consumers
level of maturity — young wines are different from mature wines
quality — quality is usually related to grape-growing and wine-making effort
temperature — the temperature for optimal sense detection when tasting
the way we sample the wine — sniff; swirl and sniff; cover, swirl then sniff; sip; swallow.

I am sure that you can all think of more possibilities. The point, though, is how do we deal with these in an experiment, using one or more of the above approaches? You can see that it will not be easy. Indeed, you might consider it to actually be impractical. That is why I have never tried it, for this case.

¹ I clearly recall several times drinking white wines in Sicily that became more and more uninteresting as they warmed up on the restaurant table. Fiano wines are very good when kept cool!

² In 2008, the American Association of Wine Economists Working Paper 16 reported:

In a sample of more than 6,000 blind tastings, we find that the correlation between price and overall rating is small and negative, suggesting that individuals on average enjoy more expensive wines slightly less. For individuals with wine training, however, we find indications of a positive relationship between price and enjoyment.

³ I still have a clear memory of a traumatic experience as a child, in which the restaurant waiter put parmesan on my spaghetti, and I couldn't eat it because to me it smelled like vomit.

Monday, July 15, 2019

The role of Wine Influencers — more of the same

The experts are no longer just “in the stands”. These days, everyone’s a public expert, because social media (eg. Facebook, Instagram, Twitter, Youtube) allow us to trumpet our own opinions to the entire world. If you happen to have thousands of followers, then you are also an Influencer.

This means that when LeBron James drinks a wine it will sell a lot better. The idea seems to be: “You may not be able to play like LeBron, but you can drink like him.”

To some people, this seems like a new phenomenon — a product of modern social media (eg. The LeBron Factor: who drives wine trends today). However, it is simply an extension of things that have always happened. For example, there have always been fads in personal names based on who is in fashion at the moment — virtually no-one was called “Kylie” until Kylie Minogue came along (see the Baby Name Wizard). The so-called LeBron Factor is simply a quick-acting version of the same thing.

However, given that LeBron seems to prefer cult wines, I suspect that he actually knows squat about wine, but is instead what we call a “label drinker”. This is perfect for Instagram, of course, since that social medium is all about pictures, and the only part of a wine that ever appears in most pictures is the label.

Wine influence

Where does this leave the wine industry? It leaves us with one version of the field of Influencer Marketing — a form of social media marketing involving endorsements from Influencers (eg. Influencer marketing is trending right now because it can work). You don’t put an ad in a magazine, instead you put a bottle in the hand of an Influencer. This could be cheaper, except that the Influencer may charge you more than the magazine ever possibly could. This is called Pay to Play, of course.

On the other hand, the wine industry is still pretty much where it always was, because we have always had what are now called Micro- or Nano-Influencers:

Social media marketing involving endorsements from influencers, people and organizations who possess an expert level of knowledge as well as social influence in their respective field.

The key difference here (Micro influencers on Instagram) is that:

Micro-influencers aren’t big-name celebrities that you’ll find in the tabloids. Instead, they are [people] with anywhere from 1,000 to 100,000 followers. They’ve built their audience thanks to their niche knowledge and authenticity. Most importantly, their followers trust them and engage with them at a much higher rate than other [influencers]. Indeed, micro-influencers have been found to be among the most effective sellers on the market.

So, marketers may get better responses from sponsored posts when they’re published by Nano-influencers — they have a smaller reach, but their followers are highly dedicated.

Once again, there is nothing fundamentally new here. We have always had media personalities with wine expertise, and with many followers — in the old days they just used magazines and newsletters (and blogs!) rather than Facebook, Instagram, Twitter, and Youtube. As noted by David Shaw way back in 1987 (Wine writers: squeezing the grape for news):

A wine writer is a physician or a lawyer with a bottle of wine and a typewriter, looking to see his or her name in print, looking for an invitation to a free lunch, and a way to write off the wine cellar.

The most famous and influential of these has been Robert M. Parker Jr, who was a corporate lawyer before he became a Wine Influencer. In another 1987 article, Shaw noted (Wine critics: influence of writers can be heady):

Parker is the most influential wine writer on any publication, by such a wide margin that there really isn’t anyone in second place ... Parker’s visibility and impact have been magnified far beyond that of his newsletter readership [The Wine Advocate]. Several other wine newsletters and newspaper and magazine wine writers also trigger consumer demand when they write favorable reviews, but none approaches Parker’s extraordinary influence.

Parker, himself, seems to have always downplayed his own role, at least in public, referring instead to his “alleged power”. He set out to be a consumer advocate (hence the name of his newsletter), but instead he became principally a marketing tool. I would find this annoying, if it had happened to me.

Anyway, there is nothing new under the sun, just new ways to do the same old things. There are a number of online lists to tell us who are the current wine Micro-influencers (eg. Here are some of the top wine influencers that you need to check out in 2019 ; Top ten influential wine experts in the beverage industry).

The consequences of influence

There have been recent media comments about the potential downsides of the use of social media for influence. Is this a new phenomenon?

There have always been questions about potential conflicts of interest in the wine media. Indeed, David Shaw’s articles quoted above were intended to reveal what was then presumably unknown to much of the reading public — many, if not most, newspaper and magazine wine writers were paid very little money, and relied on wine producers and marketers in a way that could easily be seen as a conflict of interest.

Shaw seemed to take it a bit too far, mind you, because he tried to apply what he calls “ethical standards” for journalists to a bunch of amateurs who were (and are) emphatically and self-admittedly not professional journalists. However, the debate continues (see Is “pay to play” wrecking wine criticism?); and wine commentators continue to be questioned about their real motives for writing about wine (and in the past people such as Nathan Chroman and Jay Miller have left their publishers, under a cloud).

To what extent, then, are potential conflicts of interest relevant in the modern social media?

I noted above that Pay to Play is an established part of Influencer Marketing. Indeed, a simple web search will lead you to several sites that list Influencers who are available for a fee. At least it is out in the open in this case.

Marketers come under the same scrutiny, of course. For example, some Australian wine companies have recently been accused of using social media influencers to promote their products without disclosing their sponsorship (Australian wine companies accused of influencing influencers). Clearly, transparency has always been the key when conflicts of interest are under the microscope. The attempts by the Australian wine industry to penetrate the Chinese wine market are well known, and this year they openly hosted a group of Chinese wine influencers in Australia (China’s wine imports continue to slide as Australia overtakes France).

There is also the matter of having too much influence. Quoting Shaw from 1987 again:

The major criticism of Parker is that he has become too influential, too powerful a force in the industry. Indeed, many wine makers worry that his influence is so pervasive and his preferences so clear — he generally seems to like big, robust wines more than lighter, more elegant wines — that he is influencing wine makers as well as wine buyers.

This level of influence has not yet happened in the modern social media.

Reactions to Influencers

One of the more pointed comments comes from Outwines:

Unfortunately, wine seems to be one of a growing number of subjects that an individual can “influence” and have minimum actual knowledge about it. Snap a ton of pictures with wine bottles and smile, make sure to comment constantly and generically on others posts (something like “That looks like some good wine!” or “I’ll have to try that!”), join multiple pods, and boom – you’re on your way to being an influencer. For those in the wine industry, some of these influencer accounts can be exasperating since they tend to focus way more on engagement as opposed to actual wine education or experience. In order to be a true “wine influencer” – shouldn’t an individual ideally have some combination of all three?

Indeed, a couple of years ago Miquel Hudin discussed Why social media doesn’t sell wine, suggesting that places like Instagram aren't particularly well-suited to communicating about wine education or experience.

More recently, Miquel also discussed the potential for fraudulent use of social media (How that “wine influencer” might very well be a fraud). That blog post engendered a range of quite strong comments.

Finally, there are extreme examples of what happens in the world of Influencers. For example, recently an Instagram “star” posted a suicide note online, which has lead to Denmark planning government regulation of Influencers. Sadly, the note apparently got 30,000 likes before it was taken down.

Conclusion

Perhaps this is all a storm in a teacup. A recent report notes: Sponsored posts from Instagram influencers are driving less engagement (engagement is measured by comparing the average number of likes on each Instagram post to the number of followers of the account). Maybe it will all simply go away again, like a fad. After all, there are other members of The real influencers of the wine world.

Monday, July 8, 2019

The sources of wine quality-score variation

Wine-quality scores are coming under increasing pressure these days, not least because they seem to miss the idea that wine might be something special — see, for example: It’s time to rethink wine criticism. Wine is, after all, more than just numbers!

On the other hand, I have also commented in this blog on the characteristics of wine scores as numbers, noting that almost all scores are biased, whether they come from professionals. semi-professionals, or the general wine community. As noted elsewhere (Wine ratings might not pass the sobriety test):

A rating system that draws a distinction between a cabernet scoring 90 and one receiving an 89 implies a precision of the senses that even many wine critics agree that human beings do not possess. Ratings are quick judgments that a single individual renders early in the life of a bottle of wine that, once expressed numerically, magically transform the nebulous and subjective into the authoritative and objective.

The main issue, as I see it, is the lack of repeatability of the ratings between tasters. I have previously noted (The poor mathematics of wine-quality scores):

Most wine commentators’ wine-quality scores are personal to themselves. That is, the best we can expect from each commentator is that their wine scores can be compared among themselves so that we can work out which wines they liked and which ones they didn’t.

This has also been discussed in this post: In their own words, this is how seven professional wine writers and critics go about rating a bottle.

On the other hand, little has been said about repeatability by the same taster, when they re-taste a wine. However, even at the top, Robert M. Parker Jr once commented:

How often do I go back and re-taste a wine that I gave 100 points and repeat the score? Probably about 50% of the time.

When the Points Guru tells you that even his scores are not repeatable, you should believe him! *

It therefore seems to be of some interest to illustrate a few specific examples where we can clearly see the lack of repeatability of wine-quality scores, and the source of the variation in scores. This is what I do below.

Between magazines

Let's start by looking at the same wines as scored by two different wine magazines, in this case the Wine Spectator and the Wine Advocate. I have used these data as part of several earlier posts (eg. How large is between-critic variation in quality scores?).

In the following graph, each point represents one wine, with the Spectator wine-quality score shown vertically and the Advocate score shown horizontally. The wines are from the top Bordeaux chateaux (Latour, Lafite, Margaux, Mouton, and Haut-Brion) for the vintages 1975-2014. There are a total of 195 wines. Points that lie on the line scored the same from both magazines, whilst those above the line did better from the Spectator, and those below the line did better from the Advocate.

Wine Spectator versus Wine Advocate scores for the top Bordeaux chateaux

Note, first, that 36 of the points lie on the line (18.5%), showing that only one-fifth of the wines were evaluated identically. The remaining wines differ by up to 14 quality points, with an average difference of 2.8 points. A correlation analysis shows that, overall, 65% of the variation in scores is shared between the two magazines, which we can interpret as the magazines sharing two-thirds of their opinions about the wines.

The second thing to note is that none of the wines score 100 points from both magazines simultaneously, although there are 7 perfect scores from the Spectator and 13 from the Advocate. So, 15% of the wines are considered to be potentially very top quality, although there is no agreement on which wines they actually are.

Within a magazine

Most magazines have several people tasting their wines, often covering different geographical areas, although these usually overlap.

The next graph shows the scores from 9 of the people who tasted wines for the Wine Spectator, covering the period 2006–2015. Each of them tasted 5,000–25,000 wines during that time (the data come from Wineinformatics: a quantitative analysis of wine reviewers). In the graph, the quality scores are grouped horizontally, with the percent of scores for each group shown vertically, for each taster.

Wine-quality scores from the Wine Spectator tasting team

Obviously, most of the people scored their wines in the 85–89 range, except for Bruce Sanderson, who preferred 90–94 scores. Also, very few of the wines scored 95–100, from any taster.

However, there are some very different patterns here. For example, compared to his colleagues, James Molesworth greatly preferred the 80–84 and 85–89 ranges, at the expense of the 90–94 range. However, the two people who were most different from their colleagues are: MaryAnn Worobiec, who preferred the 85–89 range much more than did her colleagues, and the 90–94 less than they did; and Bruce Sanderson, who showed the strongest preference for a score of 90–94 over 85–89. Harvey Steiman and James Laube preferred scores of 90–94 over 80–84, although they may both claim that their wines justify those scores. The other tasters showed patterns that were fairly similar to each other.

Repeat tastings by one person

Finally, it is worth noting that, while wine critics sometimes do retrospective tastings of particular wines, there are very few published data about attempts to re-taste (and re-score) wines not long after they were originally tasted. One person who has done this is Rusty Gaffney (Quick trigger: are reviews done too soon?).

The following graph shows his scores for 21 Pinot noir wines, with each point representing one wine. The original score is shown horizontally (note that all of the wines scored ≥ 90 points), and the score when tasted again 16–26 months later is shown vertically. Points that lie on the line scored the same on both occasions, whilst those above the line did better at the second tasting, and those below the line did better the first time.

Note that only 3 wines got the same score on both occasions, with 10 doing better at the re-tasting and 8 doing worse. The maximum difference was 4 points.

So, about half of the wines were better and half were the same or worse when re-tasted 2 years later, which is what might be expected from random chance. While bottle variation may be a factor here, it is unlikely to change the results (although it might determine which wines did better or worse).

Conclusion

All three datasets show that variation in wine-quality scores is substantial, and that it arises from several sources. When you combine these sources of variation, it is difficult to attribute any mathematical precision to the use of numbers for wine commentary.

So, why aren't wines given a range of points, rather than a single score? It would make much more sense, given the mathematical reality of the situation.

* Perhaps more tellingly, in a 1999 article for the Los Angeles Times, David Shaw (He sips and spits — and the world listens) noted of Parker:

More than once he’ll be asked if he’d be willing to demonstrate his consistency. Would he taste and score five or six wines “blind” — without knowing what they are — and then taste and score them again a day or two later? “No,” he says. “I'm not doing trained dog tricks. I’ve got everything to lose and nothing to gain.”

Apparently Parker neither respects scientific experiments nor understands their use.

Monday, July 1, 2019

These countries drink wine but don't import it

Many countries produce wine, while many don't; and many countries consume wine, while many don't. Since the places of production and consumption are not always the same, there is a lot of wine movement around the globe (see the post on Where does all of this wine come from and go to?).

An interesting question, then, is: Which countries consume mostly their own domestic product and which countries import their wine? This is the topic of this post.

The information comes from The International Spirit and Wine Record.* The graph below shows the data for those 80 countries with detectably non-zero production but where estimated wine consumption exceeded 1 million 9-L cases (a dozen bottles) for the year 2017. The vertical bars show us the estimated number of wine cases that are imported as a percentage of the total number of cases of wine. [Note: only every second country is labeled.]

Wine imports as a percentage of total consumption by country

Globally, imported wine consumption comprises an average of 33% of the wine market, which means that an awful lot of wine is being moved around internationally. However, there is apparently one country that does consume a lot of wine but imports next to nothing: Argentina (0.04% import).

There were actually 24 countries that each consumed >500,000 cases of wine in 2017 and yet imported <15% of their domestic consumption [Note: there really were no countries between 8.3% and 15%]:

Argentina
Tunisia
South Africa
Chile
Moldova
Italy
Georgia
Greece
Macedonia
Romania
Bulgaria
Turkey
Uzbekistan
Hungary
Spain
Slovenia
Morocco
Armenia
Serbia
Croatia
Uruguay
Portugal
Azerbaijan
Egypt

0.0%
0.2%
0.2%
0.5%
0.5%
0.6%
0.6%
0.8%
1.1%
1.9%
2.2%
2.3%
2.3%
3.1%
3.2%
3.2%
3.3%
3.3%
5.2%
5.6%
5.7%
6.2%
6.8%
8.3%

Clearly, several of these countries have populations that mostly do not consume alcohol, and therefore domestic production is quite sufficient for their small needs. However, quite a few of the listed countries do have a large amount of wine production, and are well known as wine exporters as well as consumers (South Africa, Chile, Italy, Spain, Portugal ...).

Note that Spain and Italy are two of the world's biggest producers, and therefore it is unsurprising that their populations drink mostly the domestic product. On the other hand, France is listed at 21.2% imported wine, but this is likely to be the result of the many tanker loads of Spanish wine that the French import and add to their own production lines.

Australia is another well-known wine producer with relatively low levels of import (17.4%). Other well-known wine-producing countries with somewhat larger import levels include New Zealand (25.2%) and Germany (51.2%), both of which produce a fairly restricted range of wine types, and therefore presumably need to import the rest (the muscly reds, for example).** There is also the USA (27.3%), which imports mainly premium wines (see The USA imports more expensive wines than anywhere else).

At the other end of the spectrum, there are 16 countries that each consumed >500,000 cases of wine in 2017 and yet nominally imported 100% of their domestic consumption (in decreasing order of consumption):

Netherlands
Sweden
Denmark
Ireland
Norway
Finland
Hong Kong
Namibia
Taiwan
Estonia
Singapore
Malaysia
Haiti
Guadeloupe
Martinique
Macao

100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%

Obviously, none of these countries are well-known for wine production; and yet several of them do, indeed, have commercial operations, albeit small.

For example, Vineyards in the EU lists these countries as having <500 ha of vineyards, and therefore are not included in the European Union statistics: Belgium, Denmark, Estonia, Finland, Ireland, Latvia, Lithuania, Malta, the Netherlands, Poland and Sweden. I can personally confirm the existence of vineyards and very nice wines in Sweden, Belgium and the Netherlands (see also the blog post on Bizarre wine data).

As a final point, it is worth noting that wine imports can have very specific goals in terms of servicing the domestic market. As but one example, 19% of the Australian wines imported into Sweden are labeled “organic”, and 49% of Australia's exported “organic” wines end up in Sweden (see Organic wine — a sustainable trend?). Indeed, organic wine is apparently an increasing import trend throughout the Nordic market (Sweden, Norway, Finland and Denmark). For example, c. 50% of Argentinean “organic” wine exports also goes to Denmark and Sweden.

* The data can also be accessed through the Wine Australia Market Explorer.

** In 2018, 75% of all New Zealand wine was made from Sauvignon blanc, mostly from the Marlborough region (see The question of supply). Germany is not quite such a vinous monoculture.