Monday, May 6, 2024

What happens when you score wines at comparative tastings?

I rarely write about specific wines in this blog. However, I first learned about wine by doing comparative tastings in the early 1980s. My local “bottle shop” (or “liquor store” or “off-licence”, etc) in Australia would have a few bottles open on Thursday evening (late-night shopping night), which had been left as samples by distributors, in the hopes of getting the owner to stock them. We customers would comment.

Eventually, the proprietor realized that there were enough of us doing this that he could organize special comparative tastings on weekends, for which we would pay. I would often bring along a special old bottle in lieu of paying cash. I now do comparative tastings of my own, nearly 40 years later, to which I invite my vinous friends. *

Bordeaux vineyard.

It is therefore worth looking at the results of these sorts of comparisons, to see what we can learn. Bob Henry, who was introduced a couple of weeks ago (in the post: Some new notes on Rudy Kurniawan and his activities), has conducted many such tastings while “moonlighting” on weekends on the sales staff of leading wine stores around Los Angeles, and as an organizer of wine cellars for private parties in Los Angeles. I will look at a few tastings that he conducted in the mid—late 1990s. In particular, I will look at tastings that compared two distinct groups of wines on the same tasting occasion.

As a starter, there was a tasting of seven Bordeaux chateau wines from both the 1989 and 1990 vintages, tasted in the winter of 1994:
  • Château L’Angélus
  • Château Pichon Longueville Baron
  • Château La Mission Haut-Brion
  • Château Palmer
  • Château La Conseillante
  • Château Montrose
  • Château Léoville-Las Cases.
There were 16 participants, who tasted each of the 14 wines blind (ie. the bottle in a paper bag, although the order of wines was not random). Each person was asked to rank-order their three preferred wines from the seven for each vintage, which were then assigned points: 3 points for 1st preference, 2 points for 2nd preference, and 1 point for 3rd preference (yielding a total of 96 points).

It is important to note that these results are relative only to each other, and there is thus no assessment of the wines on any absolute scale (eg. a score out of 100). The results are summarized in this first graph.

Flight scores for the two vintages.

Clearly, the “best” (La Conseillante) and “worst” (La Mission Haut-Brion) wines were consistent across the two vintages; but otherwise there are some notable issues here. For example, three of the wines got the same score for the 1990 wine (13 points) but differed greatly from each other for the 1989 wine: 7 (Pichon-Baron), 11 (Léoville-Las Cases), and 24 (L’Angélus) points. Vinous things were apparently not consistent for these chateaux in those days.

This tells us nothing about how the two vintages compared, of course. So, at the end the participants were also asked to rank all 14 wines simultaneously, with points once again for their first, second and third preference — only then were the wine identities revealed. These results are summarized in this next graph, where the score from graph 1 is plotted horizontal and the overall score is vertical. The 1989 wines are in blue and the 1990 wines in pink.

Overall score compared to flight score.

Clearly, the 1989 wines were preferred to the 1990 wines, at this particular tasting, when the wines were still fairly young. Subsequent assessments usually place 1990 slightly ahead of 1989 (eg. Bordeaux vintage chart 1959 to today), although both vintages were streets ahead of the half-dozen Bordeaux vintages before and after them. [Bob has noted that the 1989 La Conseillante remains one of his all-time favorite wines in the world; and he has consistently declared that the 1989 red Bordeaux vintage ranks among the very best in half a century.]

However, the interesting thing is the apparent inconsistency that arises — wines do always get an overall score that matches their score within their own vintage flight. ** For example, three wines all scored 13 in the 1990 flight and yet got different overall scores (0, 1, 4). Even worse, several of the wines did better when compared overall (across the two vintages) than they did within their own vintage flight.

These results are quite illogical. This appears to be the answer to the question posed in the post’s title!



* “Baby Boomers prefer the luxury and educational experience of wine” (Millennials and Gen X want a wine vacation, not an education).

** What’s a wine flight and why is it called that?

1 comment:

  1. At my winetasting luncheons I never asked participants to assign a rating / score to any wine, be it "thumbs up / thumbs down" or "3 stars / 3 puffs" or "5 stars" or (U.C. Davis / Decanter magazine) "20 points" or (Wine Advocate / Wine Spectator / Wine Enthusiast / Wine & Spirits magazine) "100 points."

    I knew each participant had her or his own interpretation of some undefined rating system / scoring scheme / scoring scale. I didn't wish to be the Rosetta Stone that "converted" one person's rating / score into another's.

    See David's 2018 post titled "Why comparing wine-quality scores might make no sense."

    Lead sentence: "There is no mathematical meaning to comparing wine-quality scores between different critics."

    URL: https://winegourd.blogspot.com/2018/04/why-comparing-wine-quality-scores-might.html

    I did believe that rank ordering (1st, 2nd, 3rd) would fairly elicit the degree of preference each participant found in the wines.

    (Aside: to break ties between rank ordered wines, I asked this hypothetical question: "Imagine Bob will let you take home -- for free -- only one of your tied preference wines. Which bottle goes home?"

    That never failed to nudge participants to choose . . . and not endlessly dither.)

    ReplyDelete