One possible reaction to this situation has been to deride the scores, or even reject the very concept of scores being useful in the wine industry. For example, back in 2005 Elin McCoy (The Emperor of Wine: The Rise of Robert Parker, Jr. and the Reign of American Taste) expressed the view of many people:
I find scoring wine with numbers a joke in scientific terms and misleading in thinking about either the quality or pleasure of wine, something that turns wine into a contest instead of an experience.We can thus ask ourselves whether wine-quality scores will play as big a part in the future, after Parker’s retirement, as they have over the past three decades (see After Parker: wine in the ‘Age of Re-Discovery’).
However, let us suppose for a moment that they will. If we take this road, then we need to evaluate the wine scores themselves, to try to understand how scores from different critics relate to each other. If we are going to use numbers, then those numbers need to be interpretable, preferably without us having to know about the unique facets of each and every wine critic.
That is, we need to ask: Is there some common basis to wine scores? That is, is there a shared scale (see Denton Marks, A critique of wine ratings as psychophysical scaling)? We need this to be so, beyond the trivial knowledge that a higher number is better than a lower number. In particular, it would be good in situations where multiple critics are employed, such as widely published wine magazines. Surely we can do better than the old fall-back, of saying: “Find a critic you like and follow their advice”. After all, we can do that without any numbers at all.
The basic issue when trying to compare critics is finding circumstances under which a direct comparison between them would be fair. Speaking as a professional scientist, it has long been established that, for a comparison to be valid, all of the circumstances need to be identical except for the one characteristic that you are studying. In the case of wine scores, this means that the critics should ideally be tasting the same wines, at the same time and the same place.
This does not usually happen. Either different critics are tasting different wines, or they are doing so at very different times (months or years apart), as well as very different places (even on different continents). I do know, however, of one situation where the exact same wines do get tasted at almost the same time and place. This is worth looking at.
I have noted before (Wine monopolies, and the availability of wine) that the Swedish wine chain Systembolaget, along with its general assortment of wines, releases small quantities of wines 20 times per year (c. 60-90 products per release). These wines are tasted by various media commentators shortly before their release. So, while these critics are not actually in the same room doing the tasting, this situation may be as close as we can expect to find in practice.
So, in order to address the question posed in this blog post’s title, I will compare the data from 2019 for two of these media sources. I am well aware that comparing only two critics is rather limited, especially as most of you have never heard of either of them. After all, in 2018 Morten Scholer (Coffee and Wine: Two Worlds Compared) listed 44 different sources of 100-point schemes and 18 different 20-point schemes, plus 16 others, none of which were from the modern social media; and neither of the ones discussed here was included.
Both score sources use a 20-point scale. The first source is one I have used before, from Jack Jakobsson at BKWine Magazine. I deleted the data for beer, cider, saki, fortified wines, and spirits, leaving the reds, whites and rosés. The points are provided in 0.5-point increments.
The second source is from Johan Edström at Vinbanken. The scores are reported separately for reds and whites, with rosés included with the whites. The points are usually provided in 0.5-point increments, although occasionally finer divisions appear.
There were 1,034 wines scored by both sources during 2019, with another 20 solely by Vinbanken and 15 solely by BKWine. This makes for a healthy sample size. The direct comparison between them is shown in the first graph. Each point represents one or more wines (depending on how many wines got the same scores), with the Vinbanken score shown horizontally and the BKWine score shown vertically.
The dashed line is the line of equal point scores for the wines. Clearly, the BKWine scores are below this line quite a lot, indicating that they are often less than the Vinbanken score for the same wine. Indeed, the average difference is 0.57 points — this is shown by the solid line, which clearly runs through the center of the points distribution.
Another way of seeing this same pattern of difference is shown in the next graph, which displays the counts of the difference in points for each wine — Vinbanken score minus BKWine score. It shows that the Vinbanken score varies from 2.5 points less than the equivalent BKWine score to 4 points greater, However, most of the wine-quality scores (71%) are either equal or the Vinbanken score is ≤ 1 point greater. So, the evaluations of wine quality are in broad agreement.
However, the amount of information shared by the two sets of scores is c. 55% (ie. R2 = 0.55) and the other 45% is unique to one set of scores or the other — quite literally, the glass is both half full and half empty. In one sense this is quite good, because R2 values this high are relatively rare for subjective (hedonic) judgements. On the other hand, I suspect that most wine drinkers are expecting better than this. If critics only half agree, then the consumer may not be much better off with them than without them.
Note, also, that the differences in points are more pronounced for smaller point scores — that is, there is more variation in points at the left of the first graph. Indeed, the biggest variation is at 15 Vinbanken points. So, it seems that there is more agreement for the better-quality wines than for the lower qualities.
Finally, it is worth considering the relationship between the assessed quality scores and the prices of the wines (see my prior post The relationship of price to wine-quality scores). Based on the exponential relationship used in my previous posts, the BKWine scores correlate slightly better (54%) with the prices than do the Vinbanken scores (51%).
However, this is still the same situation as above (a glass both half full and half empty). Wine prices are only partly associated with wine quality, which means that there are both good-value wines and complete rip-offs. Nevertheless, one-third of the wines have, based on their assessed quality using each of the two score systems, an “expected” value within $US3 of each other, so that either set of scores could be used to identify wines that are selling for below their assessed worth.