- Laube versus Suckling — their scores differ, but what does that mean for us?
- 11 tasters and 20 wines, and very little consensus
When looking at variation in wine-quality scores, it is important to eliminate the effects of different bottles and tasting conditions, by having the scores be produced from the same bottles at the same time. This is, of course, what happens at most group wine tastings. Wine Spectator magazine is a useful source of this sort of information, as it has occasionally held direct tasting comparisons between pairs of its critics, among the tastings conducted by each of the region experts on their own.
The exercise
I have previously used data concerning James Laube and James Suckling, who have both provided wine-quality scores to Wine Spectator regarding Cabernet wines (Laube versus Suckling — their scores differ, but what does that mean for us?). This time, I will compare James Laube with Per-Henrik Mansson, as they have both provided scores for Chardonnay wines, with Laube as the California expert and Mansson as the Burgundy expert. Mansson has subsequently moved on from the magazine.*
The dataset I will use here is from the "Chardonnay Challenge" of 1997 (see Wine Spectator for November 15, 1997, pp. 46–70), in which our two critics tasted 10 California Chardonnay wines and 10 Burgundy white wines from both the 1990 and 1995 vintages.** However, there were only 39 bottles of wine with which to compare their scores, as one of the Burgundies from 1990 was not available in time for the tasting.
The data are shown in the first graph, with Laube's scores vertically and Mansson's horizontally. Each point represents one of the 39 bottles.
This does not look too good, to me — in fact, it looks terrible. There is a wide spread of points in the graph (note, also, that Mansson's scores cover a bigger range than Laube's) The mathematical correlation indicates only 3% agreement between the two sets of scores, which is almost no agreement at all. To make this clear, the solid pink line shows what agreement would look like — for bottles whose points are on this line, the two critics perfectly agreed with each other. Clearly, this is only 2 out of the 39 bottles. The Laube score is > the Mansson score 22 times, and 15 times it is the other way around.
The two dashed lines in the graph show us ±2 points from perfect agreement — for bottles between the two lines, the two sets of point scores were within 2 points of each other. This allows for the approximate nature of expert opinions — technically, we are allowing for the fact that the scores are presented with 1-point precision (eg. 88 vs. 89 points) but the experts cannot actually be 1-point accurate in their assessment.
There are only 10 of the 39 bottles (26%) between the dashed lines. So, even when we allow for the approximate nature of expert opinions, there is much more disagreement here than there is agreement.
Another way of dealing with the approximate nature of expert scores is to greatly reduce the number of score categories, so that all the experts need to do to agree is pick the same category. The Wine Spectator does it this way:
95 – 100 90 – 94 85 – 89 80 – 84 75 – 79 50 – 74 |
Classic: a great wine Outstanding: a wine of superior character and style Very good: a wine with special qualities Good: a solid, well-made wine Mediocre: a drinkable wine that may have minor flaws Not recommended |
So, I have shown this scheme in the second graph. For bottles within the boxes, the two critics' point scores agree as to the word categories of wine quality. Rather poorly, this is only 6 of the 39 wines (15%). So, even this broad-brush approach to wine quality assessment provides only one-sixth agreement between the two critics.
For comparison, the Laube versus Suckling Cabernet tasting (mentioned above) produced much better agreement. Their mathematical correlation was 29% (only 3% this time), there were 5 out of 40 bottles on the solid line (2 out of 39 this time), 23 out 40 bottles between the dotted lines (10 out of 39 this time), and 25 out of 40 bottles within the squares (6 out of 39 this time). Suckling and Laube did not agree much with each other, but Mansson and Laube hardly agree at all.
To make this point clear, the third graph illustrates the differences in the paired scores, expressed as the Mansson score minus the Laube score (horizontally) and the count of the number of scores (vertically). Clearly, the scores differ by up to 10 points (Mansson greater than Laube) and 13 points (Laube greater than Mansson). I have rarely seen scores differ by this much — 13 points is a lot of quality-score difference. It is pertinent, I think, to ask whether these two people were actually tasting the same wines!
As an aside, it is worth noting the overall low scores given to the wines. Only 16 of the wines scored >90 points, even though they were all quite expensive. This is quite comparable to the previous year's Cabernet tasting, where only 17 wines scored >90 points.
What does this mean for us?
Obviously, we should be asking what is going on here. The magazine is presenting their scores as representing some sort of Wine Spectator standard of quality, but clearly this is not an objective standard of quality. The scores are personal (but expert) judgments by their individual critics, who may have very little in common.
In this case, the situation is illustrated in the final graph, which shows the average scores for each critic for the four types of wine — California versus Burgundy, for both the 1990 and 1995 vintages. Put simply, James Laube preferred the California wines in both years, and Per-Henrik Mansson particularly liked the 1995 Burgundies. The only wines they agreed about were the 1990 Burgundies.
Mansson's preference for the 1995 Burgundies is explained in his notes:
I looked beyond aromas and flavors for what I think are the two most important factors in determining a great Chardonnay: a seamless, silky texture in the mid-palate and a clean, elegant, balanced finish ... I often find that young California Chardonnays taste overly oaky and acidic. After a glass or two, they seem heavy, even dull and flat. The 95s reinforced this negative impression; compared to the beautifully balanced, elegant, supple yet succulent white Burgundies, the California whites tasted slightly bitter to me, with a few notable exceptions.Laube's consistent preference for the California wines, however, is not explicitly explained. His published notes are almost entirely about how much better value for money the California wines were compared to the Burgundies — the Burgundies cost up to 10 times as much but were no better. However, since the wines were tasted blind, this cannot explain his scores. His only brief comment is:
California Chardonnays tend to be fruitier, white Burgundies a shade earthier.This is consistent with his notes for the previous Cabernet tasting:
I like my wines young, rich, highly concentrated and loaded with fruit flavors.The problem for us is that these critics' quality scores are not really comparable. They give us a rank order of preference for each critic, but any attempt to directly compare them makes little sense. Unfortunately, comparing them is precisely what the magazine actually asks us to do (and I did!).
* I bet his name was MÃ¥nsson before he left Sweden.
** Thanks to Bob Henry for sending me a copy of the magazine.