In the previous post (How many wine-quality scales are there?) I discussed the range of ratings systems for describing wine quality that use 20 points. However, perhaps of more direct practical relevance to most wine drinkers in the USA is the range of systems that use 100 points (or, more correctly, 50-100 points).
The 100-point scale is used by the most popular sources of wine-quality scores, including the Wine Spectator, Wine Advocate and Wine Enthusiast; and so wine purchasers encounter their scores almost every time they try to purchase a bottle of wine. But how do these scores relate to each other? Using the metaphor introduced in the previous post, how similar are their languages? And what do we have to do to translate between languages?
All three of these popular scoring systems have been publicly described, although I contend that it might be a bit tricky for any of the rest of us to duplicate the scores for ourselves. However, there are plenty of other wine commentators who provide scores without any explicit indication of how they derive those scores. This means that some simple comparison of a few of the different systems is in order.
As explained in the last post, in order to standardize the various scales for direct comparison, we need to translate the different languages into a common language. I will do this in the same manner as last time, by converting the different scales to a single 100-point scale, as used by the Wine Advocate. I will also compare the quality scales based on their scores for the five First Growth red wines of the Left Bank of Bordeaux, as I did last time.
The scales for nine different scoring systems are shown in the graph. The original scores are shown on the horizontal axis, while the standardized score is shown vertically. The vertical axis represents the score that the Wine Advocate would give a wine of the same quality. If the critics were all speaking the same language to express their opinions about wine quality, then the lines would be sitting on top of each other; and the further apart they are, the more different are the languages.
There are lots of different lines here, which indicates that each source of scores uses a different scheme, and thus is speaking a different language. Many of the lines are fairly close, however, and thus many of the languages are not all that different from each other. Fortunately for us, they are most similar to each other in the range 85-95 points.
First, note that the line for the Wine Spectator lies exactly along the diagonal of the graph. This indicates that the Wine Advocate and the Wine Spectator are using exactly the same scoring system — they are speaking the same language. In other words, a 95-point wine from either source means exactly the same thing. If they give different scores to a particular wine, then they are disagreeing only about the quality of the wine — this is not true for any other pair of commentators, because in their case a different score may simply reflect the difference in language.
It is worth noting that almost all of the Wine Advocate scores came from Robert Parker, while most of the Wine Spectator's were from James Suckling, along with a few from Thomas Matthews, James Molesworth and Harvey Steiman (who have all reviewed the red wines of Bordeaux for that magazine), plus some that were unattributed.
Second, the line for the Wine Enthusiast always lies below the diagonal of the graph. This indicates that the Wine Enthusiast scores are slightly greater than those of the Wine Advocate (and Wine Spectator) for an equivalent wine. For example, if the Enthusiast gives a score of 80 then Parker would give (in the Advocate) 78-79 points for a wine of the same quality. This situation has been noted in Steve De Long's comparison of wine scoring systems, although it is nowhere near as extreme as he suggests.
Third, the line for Stephen Tanzer always lies above the diagonal of the graph, indicating that his scores are usually slightly less than those of the Wine Advocate (and Wine Spectator). Indeed, a 100-point Parker wine would get only 98-99 points from Tanzer.
All of the other lines cross the diagonal at some point. This indicates that sometimes their scores are above those of the Advocate and sometimes they are below. Interestingly, most of these systems converge at roughly 91 points, as indicated by the dashed line on the graph. So, a 91-point wine means more-or-less the same thing for most of these commentators (except Tanzer and the Enthusiast) — it is the only common "word" in most of the languages!
The most different of the scoring schemes is that of James Suckling, followed by those of Jeannie Cho Lee and Richard Jennings (which are surprisingly similar). Suckling is a former editor of Wine Spectator, and he actually provided most of the scores used here for that magazine — this makes his strong difference in scoring system on his own web site particularly notable, as it implies that he has changed language since departing from the Spectator.
Finally, it is important to recognize that all I have done here I have evaluate the similarity of the different scoring systems. Whether the scores actually represent wine quality in any way is not something I can test, although I presume that the scores do represent something about the characteristics of the wines. Nor can I evaluate whether the scores reflect wines that any particular consumer might like to drink, or whether they can be used to make purchasing decisions. Nor can I be sure exactly what would happen if I chose a different set of wines for my comparisons.
The short answer to the question posed in the title is: pretty much one for each commentator, although some of them are quite similar. Indeed, the Wine Spectator and the Wine Advocate seem to use their scores to mean almost the same thing as each other, while the Wine Enthusiast gives a slightly higher score for a wine of equivalent quality.
While there are not as many wine-quality rating systems as there are languages, the idea of translating among them is just as necessary in both cases, if we are to get any meaning. That is, every time a wine retailer plies us with a combination of critics'
scores, we have to translate those scores into a common language, in
order to work out whether the critics are agreeing with each other or
not. Different scores may simply reflect differences in scoring systems not differences in wine quality; and similarity of scores does not necessarily represent agreement on quality.
Averaging the scores from the different critics, as is sometimes done, notably by Wine-Searcher and 90plus Wines, is unlikely to be a valid thing, mathematically. Given the results from this and the previous post (How many wine-quality scales are there?), calculating a mathematical average score would be like trying to calculate a mathematically average language. Jean-Marie Cardebat and Emmanuel Paroissien (American Association of Wine Economists Working Paper No. 180. 2015) have correctly pointed out that the different scoring systems need to be converted to a common score (ie. a common language) before any mathematics can be validly applied to them.