Monday, February 19, 2018

Wine-quality scores for premium wines are not consistent through time

When dealing with professional wine-quality scores, the usual attitude seems to be: "one wine, one score". We have all seen wine retailers where, for each wine, only one quality score is advertised from each well-known wine critic or magazine. This is often either the most recent score that has been provided, or it is the highest score that has been given to that particular wine.

However, we all know that this is overly simplistic. The score assigned to a wine by any given taster can vary through time for one or more of several reasons, including: bottle variation, tasting conditions, personal vagaries, and the age of the wine. So, one score is actually of little practical use, even though that is usually all we get from retailers.

The point about the age of the wine is of particular interest to wine lovers, since there is a perception that premium wines should increase in quality though time (that's why we cellar the wine), before descending slowly to a mature old age (the wine, as well as us). It is therefore of interest to find out whether this is actually so. When wine critics repeatedly taste the same vintage of the same wine, do their assigned quality scores show any particular pattern through time? Or do they correctly assess the wine when it is young, so that it continues to get the same score as it matures?

This turns out not to be an easy question to answer, because in very few cases do critics taste a single wine often enough for us to be able to get a worthwhile answer; and when they do do repeat tastings, they do not always publish all of the results. I have previously looked at the issue of repeated tastings by comparing pairs of tastings for several wines (Are the quality scores from repeat tastings correlated?), but I have not looked at single wines through time.

Some data

So, I have searched around, and found as many examples as I can find of situations where a single critic has publicized scores for the same wine (single winery and vintage) at least six different times since 2003. I got my data from CellarTracker, WineSearcher and 90Plus Wines (as described in a previous post)

It turns out that very few people have provided quality scores for more than five repeats of any one wine (who can afford to?). It also turns out that the most likely place to find such scores is among the icon wines from the top Bordeaux châteaux. The critics I found are: Jeff Leve (27 wines), Richard Jennings (3 wines), Jancis Robinson (2 wines) and Jean-Marc Quarin (1 wine).

The graphs are tucked away at the bottom of this post, and I will simply summarize here what they show. They all show roughly the same thing: a lot of variation in scores through time, with a spread of points for any one wine never being less than 2; and the scores generally show a slight decrease through time.

The first four graphs are from Jeff Leve (at the Wine Cellar Insider). The first graph is for seven vintages of Château Latour. The scores generally stay within 2-3 points for each wine; and only the 1990 could be considered to show any sort of increase in score through time. The second graph is for Château Lafite-Rothschild, Château Mouton-Rothschild and Pétrus — the first two generally stay within 2 points, but the latter is all over the place. The third graph covers seven vintages of Château Margaux, which rarely stay within 2 points, and the 2000 vintage shows a strong decrease in score through time. The fourth graph covers nine vintages of Château Haut-Brion. The scores often do not stay within 2 points, especially for the 1961 vintage; and only the 1998 vintage increases slightly through time.

The fifth graph is for Richard Jennings (from RJ on Wine). All three of the vintages covered show a decrease in score through time. Finally, the sixth graph shows a couple of wines of Château Latour from Jancis Robinson and one from Jean-Marc Quarin, both of whom use a 20-point quality scale. Their scores range by at least 2 points per wine; and Quarin's wine strongly decreases in score through time.


I think that it might be stretching a point to claim that any of these wines show a consistent score through time — they go up and down by at least 2 points, and often more. We certainly can't claim that the scores increase with repeated tastings — if anything, the general trend is more often downwards.

There are a couple of possible explanations for this variation, in addition to the obvious one that the critics don't have much idea what they doing.

The classic explanation is "bottle variation" (rather than "taster variation"). For example, Robert Parker once wrote (Wine Advocate #205, March 2013): "I had this wine four different times, rating it between 88 and 92, so some bottle variation (or was it my palate?) seems at play." Parker's results would fit perfectly into the graphs below. As confirmation of this point, the widely reported 2010 results of the Australian Wine Research Institute’s Closure Trial certainly indicated a very large amount of bottle variation for cork-closed bottles (see Wine Spectator, Wine Lovers).

If this is the explanation, then the consistently erratic nature of the results, and the expected high quality of the wines, does make me wonder about the advisability of buying expensive wines. Huge bottle variation for cheap wines might be expected, but cannot be acceptable for the supposedly good stuff, even if only for financial reasons. This topic is discussed in more detail by, among others, Wilfred van Gorp, Jamie Goode, and Bear Dalton.

At the extreme, bottle variation can refer to flawed wines, of course. In the graph for Richard Jennings, one of the scores for Château Haut-Brion is missing, because he scored it as "flawed". Indeed, he did this for 3 of the 188 Grand Cru wines for which he provided scores (1.6%). James Laube estimates the rate of flawed wine as 3-4%. The other tasters may also have encountered flawed wines, but not reported this, as recently discussed by Oliver Styles.

Another point is the extent to which the tasters may have taken into account how old the wine was at the time they tasted it. If the wines are not tasted blind, then this always remains a strong question mark regarding the quality scores assigned.

Anyway, there is certainly a lot of leeway for retailers to select the score(s) they report on their shelf talkers and web pages. The Wine Searcher database addresses this issue by simply reporting the most recent score available.


Jeff Leve:

Jeff Leve's scores for Château Latour

Jeff Leve's scores for the Rothschilds and Pétrus

Jeff Leve's scores for Château Margaux

Jeff Leve's scores for Château Haut-Brion

Richard Jennings:

Richard Jennings' scores

Jancis Robinson and Jean-Marc Quarin:

Scores from Jancis Robinson and Jean-Marc Quarin

1 comment:

  1. Quoting from Wine Spectator (March 15, 1994, page 90) :

    "How We Do the Tastings . . . . Ratings are based on potential [future] quality, on how good the wines will be when they are at their peaks. . . . ."

    [Bob's aside: One would infer that wines tasted in their youth have lower scores than wines tasted years later as they reach their maturity "peaks."]

    Quoting from Jancis Robinson WM's website ("How to Score Wine" circa 2002):

    "I like the five-star system used by Michael Broadbent and Decanter magazine. Wines that taste wonderful now get five stars. Those that will be great may be given three stars with two in brackets for their potential. . . ."

    [Bob's aside: So Decanter magazine also projects a wine's "potential" future quality.

    When Wine Spectator and Decanter magazine conduct (say) 10th anniversary retrospective tastings, how often to the wines hit their "potential" future quality marks?]