Monday, February 25, 2019

How much difference can there be between critics?

I have previously written a couple of posts in which I looked at wine-quality scores where critics tasted exactly the same bottles of wine at the same time:
In these cases the quality scores differed to one extent or another. However, what I have not yet considered is just how big those differences can get, between any given pair of professional wine tasters. Here, I present an example where those differences are very big indeed.


When looking at variation in wine-quality scores, it is important to eliminate the effects of different bottles and tasting conditions, by having the scores be produced from the same bottles at the same time. This is, of course, what happens at most group wine tastings. Wine Spectator magazine is a useful source of this sort of information, as it has occasionally held direct tasting comparisons between pairs of its critics, among the tastings conducted by each of the region experts on their own.

The exercise

I have previously used data concerning James Laube and James Suckling, who have both provided wine-quality scores to Wine Spectator regarding Cabernet wines (Laube versus Suckling — their scores differ, but what does that mean for us?). This time, I will compare James Laube with Per-Henrik Mansson, as they have both provided scores for Chardonnay wines, with Laube as the California expert and Mansson as the Burgundy expert. Mansson has subsequently moved on from the magazine.*

James Laube Per-Henrik Mansson

The dataset I will use here is from the "Chardonnay Challenge" of 1997 (see Wine Spectator for November 15, 1997, pp. 46–70), in which our two critics tasted 10 California Chardonnay wines and 10 Burgundy white wines from both the 1990 and 1995 vintages.** However, there were only 39 bottles of wine with which to compare their scores, as one of the Burgundies from 1990 was not available in time for the tasting.

The data are shown in the first graph, with Laube's scores vertically and Mansson's horizontally. Each point represents one of the 39 bottles.

Mansson vs. Laube for 1990 and 1995 chardonnays

This does not look too good, to me — in fact, it looks terrible. There is a wide spread of points in the graph (note, also, that Mansson's scores cover a bigger range than Laube's) The mathematical correlation indicates only 3% agreement between the two sets of scores, which is almost no agreement at all. To make this clear, the solid pink line shows what agreement would look like — for bottles whose points are on this line, the two critics perfectly agreed with each other. Clearly, this is only 2 out of the 39 bottles. The Laube score is > the Mansson score 22 times, and 15 times it is the other way around.

The two dashed lines in the graph show us ±2 points from perfect agreement — for bottles between the two lines, the two sets of point scores were within 2 points of each other. This allows for the approximate nature of expert opinions — technically, we are allowing for the fact that the scores are presented with 1-point precision (eg. 88 vs. 89 points) but the experts cannot actually be 1-point accurate in their assessment.

There are only 10 of the 39 bottles (26%) between the dashed lines. So, even when we allow for the approximate nature of expert opinions, there is much more disagreement here than there is agreement.

Another way of dealing with the approximate nature of expert scores is to greatly reduce the number of score categories, so that all the experts need to do to agree is pick the same category. The Wine Spectator does it this way:
95 – 100
90 – 94
85 – 89
80 – 84
75 – 79
50 – 74
 Classic: a great wine
 Outstanding: a wine of superior character and style
 Very good: a wine with special qualities
 Good: a solid, well-made wine
 Mediocre: a drinkable wine that may have minor flaws
 Not recommended

Mansson vs. Laube for 1990 and 1995 chardonnays

So, I have shown this scheme in the second graph. For bottles within the boxes, the two critics' point scores agree as to the word categories of wine quality. Rather poorly, this is only 6 of the 39 wines (15%). So, even this broad-brush approach to wine quality assessment provides only one-sixth agreement between the two critics.

For comparison, the Laube versus Suckling Cabernet tasting (mentioned above) produced much better agreement. Their mathematical correlation was 29% (only 3% this time), there were 5 out of 40 bottles on the solid line (2 out of 39 this time), 23 out 40 bottles between the dotted lines (10 out of 39 this time), and 25 out of 40 bottles within the squares (6 out of 39 this time). Suckling and Laube did not agree much with each other, but Mansson and Laube hardly agree at all.

To make this point clear, the third graph illustrates the differences in the paired scores, expressed as the Mansson score minus the Laube score (horizontally) and the count of the number of scores (vertically). Clearly, the scores differ by up to 10 points (Mansson greater than Laube) and 13 points (Laube greater than Mansson). I have rarely seen scores differ by this much — 13 points is a lot of quality-score difference. It is pertinent, I think, to ask whether these two people were actually tasting the same wines!

Mansson vs. Laube for 1990 and 1995 chardonnays

As an aside, it is worth noting the overall low scores given to the wines. Only 16 of the wines scored >90 points, even though they were all quite expensive. This is quite comparable to the previous year's Cabernet tasting, where only 17 wines scored >90 points.

What does this mean for us?

Obviously, we should be asking what is going on here. The magazine is presenting their scores as representing some sort of Wine Spectator standard of quality, but clearly this is not an objective standard of quality. The scores are personal (but expert) judgments by their individual critics, who may have very little in common.

In this case, the situation is illustrated in the final graph, which shows the average scores for each critic for the four types of wine — California versus Burgundy, for both the 1990 and 1995 vintages. Put simply, James Laube preferred the California wines in both years, and Per-Henrik Mansson particularly liked the 1995 Burgundies. The only wines they agreed about were the 1990 Burgundies.

Mansson vs. Laube for 1990 and 1995 chardonnays

Mansson's preference for the 1995 Burgundies is explained in his notes:
I looked beyond aromas and flavors for what I think are the two most important factors in determining a great Chardonnay: a seamless, silky texture in the mid-palate and a clean, elegant, balanced finish ... I often find that young California Chardonnays taste overly oaky and acidic. After a glass or two, they seem heavy, even dull and flat. The 95s reinforced this negative impression; compared to the beautifully balanced, elegant, supple yet succulent white Burgundies, the California whites tasted slightly bitter to me, with a few notable exceptions.
Laube's consistent preference for the California wines, however, is not explicitly explained. His published notes are almost entirely about how much better value for money the California wines were compared to the Burgundies — the Burgundies cost up to 10 times as much but were no better. However, since the wines were tasted blind, this cannot explain his scores. His only brief comment is:
California Chardonnays tend to be fruitier, white Burgundies a shade earthier.
This is consistent with his notes for the previous Cabernet tasting:
I like my wines young, rich, highly concentrated and loaded with fruit flavors.
The problem for us is that these critics' quality scores are not really comparable. They give us a rank order of preference for each critic, but any attempt to directly compare them makes little sense. Unfortunately, comparing them is precisely what the magazine actually asks us to do (and I did!).



* I bet his name was MÃ¥nsson before he left Sweden.

** Thanks to Bob Henry for sending me a copy of the magazine.

6 comments:

  1. There are many variables in play between wine critics. One is their genetic make-up. Are they super-tasters, normal tasters, or non-tasters. This would determine whether certain wines would be liked or disliked more or less. Second thing is personal preference. Everybody has a style that they like more than others. As a professional wine critic they should be subjective on all wines, but if they have a favorite there might be a little bias in that score. As a wine consumer, the job is to find a wine critic that has the same tastes and ideals about what they like as you. Then those are the scores that you should focus on and take to heart.

    ReplyDelete
    Replies
    1. The point of the post is that the magazines do not agree with you — they present the comparative tastings as though the critics can be compared.

      Delete
  2. For some background information on the debate about "supertasters," see the Wine Enthusiast (January 29, 2018) article titled:

    "Do Your Genes Predict Your Wine Preference?;
    The science on supertasters is a work in progress, but one aspect is clear: both nature and nurture help shape your taste in wine."

    URL: https://www.winemag.com/2018/01/29/supertasters-wine-preference/

    Related research can be found in the "neuroenology" work of Yale neuroscientist Gordon Shepherd.

    The transcript of his 2017 interview with National Public Radio (for the benefit of Europeans, our closest U.S. counterpart to BBC Radio).

    "The Taste Of Wine Isn't All In Your Head, But Your Brain Sure Helps"

    URL: https://www.npr.org/sections/thesalt/2017/04/03/521415892/the-taste-of-wine-isnt-all-in-your-head-but-your-brain-sure-helps

    Predating Shepherd's 2016 book is his 2015 study titled:

    "Neuroenology: how the brain creates the taste of wine"

    URL: https://flavourjournal.biomedcentral.com/articles/10.1186/s13411-014-0030-9

    ReplyDelete
  3. One detail that has been lost to the fog of time . . .

    In this article, white Burgundy reviewer Per-Henrik Mansson awarded a "100 point" "perfect" score to a wine.

    No, not French.

    This one: 1990 Talbott Monterey Chardonnay.

    Higher even than California reviewer James Laube's score (93 points).

    That put the Talbott on the same qualitative level as Domaine de la Romanee-Conti Montrachet.

    Would that have been Wine Spectator's first "100 point" score for a California Chardonnay? First "100 point" score for ANY California wine?

    (Recall that we are talking 20-plus years ago, when wine score "grade inflation" was not a troubling phenomenon.)

    Footnote:

    A third opinion on the 1990 Talbott Monterey Chardonnay, courtesy of acclaimed sommelier Larry Stone, writing for the Chicago Tribune (June 4, 1992):

    "Talbott 1990, Monterey: The 1990 is rich and soft, with apple, pineapple, lime and a supple vanilla palate, yet it is not overtly fruity but spicy and vinous, emphasizing texture and a mineral character. Harmonious, perhaps the most classically structured wine of concentration in America. Almost as good as the amazing 1989. $38. ((STAR)(STAR)(STAR)(STAR)/93 [points])"

    URL: https://www.chicagotribune.com/news/ct-xpm-1992-06-04-9202190616-story.html

    ReplyDelete
  4. "Wine arbitrage" observation: the 1990 Talbott Monterey Chardonnay was offered on WineBid for the princely sum of $15 in an online auction that closed September 18, 2005. Shockingly cheap for a so-called "perfect" wine. Kudos to the eagle-eyed bidder with a long memory who claimed the two bottles for sale.

    URL: https://www.winebid.com/BuyWine/Item/Auction/38487/1990-Talbott-Chardonnay

    ReplyDelete
  5. Let me introduce into the discussion this Wall Street Journal "Table Talk" column by Bee Wilson titled "Textures That Delight and Disgust" ("Review" section, February 23-24, 2019, Page C5).

    URL: https://www.wsj.com/articles/textures-that-delight-and-disgust-11550762895

    Excerpt:

    "I’ve been thinking about texture a lot more since reading a remarkable book called 'Mouthfeel: How Texture Makes Taste' by two Danes, the food scientist Ole G. Mouritsen and the chef Klavs Styrbaek. . . ."

    Recall the two Wine Spectator cover stories comparing California wines to French wines.

    The European wine reviewers were focused on texture and mouthfeel:

    James Suckling:

    "I was more concerned with the texture and aftertaste of the wines than with their aromatic qualities or flavor characteristics."

    Per-Henrik Mansson:

    "I looked beyond aromas and flavors for what I think are the two most important factors in determining a great Chardonnay: a seamless, silky texture in the mid-palate and a clean, elegant, balanced finish ... I often find that young California Chardonnays taste overly oaky and acidic. After a glass or two, they seem heavy, even dull and flat. The 95s reinforced this negative impression; compared to the beautifully balanced, elegant, supple yet succulent white Burgundies, the California whites tasted slightly bitter to me, with a few notable exceptions."

    The California wine reviewer focused on aroma and flavor:

    James Laube:

    "I like my [red] wines young, rich, highly concentrated and loaded with fruit flavors."

    "California Chardonnays tend to be fruitier, white Burgundies a shade earthier."

    ReplyDelete