I have presented one previous example, in which I showed that two people evaluating the same wines can come to completely independent conclusions (When critics disagree). One supposed means of addressing this issue is to have groups of people doing the assessing, instead of single individuals. Here, I show an example where this does not quite work out, either.
On 9 June 2002, under the auspices of the VieVinum wine exhibition in Vienna, there was a tasting of Grüner Veltliner wines, now acknowledged to be Austria's premier white grape variety. The intention was to showcase these wines by comparing them to a set of world-class Chardonnays (as had also been done 4 years earlier). It was organized by fine wine dealer Jan-Erik Paulson, a Swede living in Germany, who reported on the tasting on his web site: Grüner Veltliner - the worlds greatest white wines?
The jury apparently consisted of 39 experienced tasters from 13 different countries. As Paulson noted: "The sensation of the tasting was how excellent the Grüner Veltliner showed and how badly some of the burgundies did."
Jancis Robinson also reported on this tasting event on her own web site, reposting Paulson's blog post as: Chardonnay v Grüner Veltliner, a knockout contest. She also noted: "The results are fascinating, and so surprising that I feel the need to participate in a similar taste-off in which I get to choose the Chardonnays." Well, this is exactly what happened.
This second tasting occurred 5 months after the first, on 30 October 2002, at the Groucho Club, in London. It was also organized by Jan-Erik Paulson, but this time was hosted by Jancis Robinson and Tim Atkin. The jury consisted of 18 wine journalists, importers and sommeliers. The results are presented in the blog post: Grüner Veltliner - distinctly groovy grape.
There was some overlap between the wines chosen for these two tastings, and so we can directly compare the resulting scores. In the first tasting there were 36 wines: 7 Grüner Veltliner, 6 Burgundies, and 23 other Chardonnays from around the world. In the second tasting there were 35 wines: 11 Grüner Veltliner, 6 Burgundies, and 18 other Chardonnays. There were 16 wines that the same in both tastings, plus a few more that differed only in vintage. The scores for the identical wines are shown in the first graph.
In this graph the average scores for the first tasting are on the horizontal axis, using the 100-point scale, while the average results of the second tasting are shown vertically, using the 20-point scale. Each point represents one wine, located according to its two scores.
If the two juries had judged each wine perfectly consistently, then the points would lie exactly on the dashed line — only two of the wines do this. If we allow a half-point leeway for the scores, since it is inconceivable that they would all be identical, in practice, then the points would lie between the two dotted lines.
Based on this criterion, there are four of the 16 wines that were scored differently by the two juries. One wine in particular, at the middle-left of the graph, was scored very differently. This was the 1997 Hamilton Russel Chardonnay, from South Africa. The Vienna jury rated it 85 / 100 (= Above average) while the London jury rated it much higher at 16.5 / 20 (= A cut above superior) — this is a difference of two quality categories.
Nevertheless, the two tastings did produce the same general result — the Grüner Veltliner wines excelled and the Burgundies did very poorly. This is shown in the second graph, where the Vienna scores have been converted to the same 20-point scale as used in London.
In this graph, the 71 bottles of wine are ranked in decreasing order of their score, with the three categories of wine distinguished by color. Note that most of the Grüner Veltliner wines are at the top of the graph while most of the Burgundies are at the tail end.
It is thus not at all clear why these Burgundies usually cost so much money — they are clearly not very good value for money.