Monday, August 5, 2024

Wine ratings involve both the accuracy and bias of the raters

Even a quick glance at the wine industry will convince everyone that putting scores on wines is big business, whether the scores vary from 80—100, 10—20, or 1—5 stars. I am not a big fan of actually doing this myself, but I do use the information when it is available, especially in relation to the price of the wines (ie. evaluating value for money). And I have written about this topic a number of times in this blog (see the link Wine scores) to the right.

Recently, a couple of papers have appeared in the academic literature evaluating the procedures for rating wines, and I thought that I might briefly summarize one of them here.

Commentator score sheets

The basic issue with any score or rating system is that each individual reviewer varies in what we might refer to as their “accuracy” (closeness to some underlying wine quality) and their “bias” (personal preferences) relative to other people’s decisions. Therefore, the rest of us do not really know how to interpret any given individual score or rating. This is an inherent limitation on every scoring system.

An example commenting on this issue from within the wine industry itself is: Why the hell don’t you ever see a 100 point Chablis? (Part 1, Part 2). This is definitely worth reading. In Part 2, the essential subjective component of all scores is put simply by Patrick Piuze:
Here is the recipe [for a perfect Chablis]: be in a very good mood with people you love and that are ready to open themselves to a great moment. Then and only then do the wines come into play. Right place, right people, right wine.
It therefore comes as no surprise whatsoever that the academic world has also had a look at this topic. One of the recent publications (May 2024) to which I referred above is:
Finding the wise and the wisdom in a crowd: estimating underlying qualities of reviewers and items; by Nicolas Carayol & Matthew O. Jackson, The Economic Journal ueae045.
This paper has been quite some time coming — the earliest reference I can find online is from September 2019, with a slightly different title, and with revisions online elsewhere in 2020. I interpret this as meaning that someone somewhere was not too happy with it, possibly the reference to a re-evaluation of the 1976 Judgment of Paris wine tasting, or the evaluation of 19 experts’ ratings of “en primeur” Bordeaux wines — people are named.

Wine scores

The formal summary of the paper reads like this:
Consumers, businesses and organisations rely on others’ ratings of items when making choices. However, individual reviewers vary in their accuracy and some are biased — either systematically over- or under-rating items relative to others’ tastes, or even deliberately distorting a rating. We describe how to process ratings by a group of reviewers over a set of items and evaluate the individual reviewers’ accuracies and biases, in a way that yields unbiased and consistent estimates of the items’ true qualities. We provide Monte Carlo simulations that showcase the added value of our technique even with small data sets, and we show that this improvement increases as the number of items increases. Revisiting the famous 1976 wine tasting that compared Californian and Bordeaux wines, accounting for the substantial variation in reviewers’ biases and accuracies results in a ranking that differs from the original average rating. We also illustrate the power of this methodology with an application to more than 45,000 ratings of ‘en primeur’ Bordeaux fine wines by expert critics. Those data show that our estimated wine qualities significantly predict prices when controlling for prominent experts’ ratings and numerous fixed effects. We also find that the elasticity of a wine price in an expert’s ratings increases with that expert’s accuracy.
If we take the Judgment of Paris wine tasting, and look at what the authors come up with for their new quality estimate (ie. addressing the potential problems with accuracy and bias), it looks like the following graph. Horizontally is the original (average) score of the taster scores, and each point represents one of the 10 wines. The new score is shown vertically. Note that seven of the points lie on a straight line, which indicates that the authors’ revised (new) score does not differ in interpretation from the original tasters’ score (ie. accuracy and bias were not a problem). However, three of the wines change their ranking in the quality order, as shown by the labels: Château Montrose and Mayacamas move up the ranking (with Montrose becoming the new “winner”), and Clos du Val moves down. This is why the authors conclude that there was “substantial variation in the reviewers’ biases and accuracies” in this famous assessment (well, for three of the ten wines, anyway).

I am not going to take sides here, but I will say that the authors do present a strong case.


The authors actually spend most of their paper on an analysis of the Bordeaux en primeur wines (based on >45,000 expert ratings of Bordeaux wines of vintages from 1998 to 2015), highlighting which experts show a measurable bias for the Right Bank wines (eg. Robert Parker, Jeff Leve, James Suckling, Chris Kissack, Wine Spectator, Yves Beck) and which for the Left Bank (eg. Jancis Robinson, Decanter, Jacques Dupont, La Revue du Vin de France, Wine Enthusiast). This does, of course, explain a lot of the disagreements between the scores produced by different review sources.

It has, however, been suggested that these days in the wine industry we are mostly just Talking to ourselves. Nevertheless, if you want a set of scores, then there is this one:
            The Official 2024 Wine Vintage Chart
Alternatively, you could focus on value for money, in which case there is:
            The best value bargain en primeur Bordeauxs.

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. In my humble opinion, of the best articles written on wine scoring / rating came from former Caltech professor / lecturer (on "randomness") Leonard Mlodinow -- who questioned whether wine competition judges / wine magazine reviewers can replicate their scores / ratings for the same wine when presented to them "single blind."

    From The Wall Street Journal “Weekend” Section
    (November 20, 2009, Page W6):

    “A Hint of Hype, A Taste of Illusion;
    They pour, sip and, with passion and snobbery, glorify or doom wines.
    But studies say the wine-rating system is badly flawed.
    How the experts fare against a coin toss.”

    URL: https://www.wsj.com/articles/SB10001424052748703683804574533840282653628

    Essay by Leonard Mlodinow Ph.D.

    (Bob's aside: if you run into the newspaper's "pay wall," then I encourage you to take out a single month's online subscription to The Journal for the princely sum of . . . U.S.D. $1.00 or U.S.D. $2.00? "Binge" on Journal articles for the next 30 days. Then decide whether you wish to "opt-out" of renewing for another calendar month.)

    There is an oft-repeated saying:

    "Information wants to be free."

    However few acknowledge the very next sentence of that saying:

    “Information Wants to Be Expensive”

    Read here (with a Journal subscription?):

    From The Wall Street Journal “Opinion” Section
    (February 23, 2009, Page A13):

    “Information Wants to Be Expensive”

    ["Take-away": Newspapers need to act like they’re worth something]

    URL: https://www.wsj.com/articles/SB123534987719744781

    By L. Gordon Crovitz
    “Information Age” Column

    ReplyDelete