This year has been awkward for the purchase of wine, as many people have commented in the media. We can all trust a global pandemic to disrupt international, national and local events.
Here in Sweden, our national liquor chain (Systembolaget) has had to change the way it releases new wines, in order to maintain social distancing among their staff in the main warehouse. This has meant that the wine release schedule has been drastically changed from the previous steady state.
In turn, this change has affected the wine commentators, as well as the wine customers. In particular, the commentators have almost all had problems at some time during this year, tasting new wines and publishing their subsequent quality assessments. For example, both BKWine and Vinbanken did not publish their usual wine-quality scores for the new releases during May and June (I have used these score sources in previous posts; eg. Are wine scores from different reviewers correlated with each other?).
While compiling this year's new-release scores for an upcoming post, I noticed that for the BKWine reports a few of the wines appeared more than once (ie. in the reports for different months). Moreover, some of these repeated wines did not receive the same scores. This situation allows us to comment quantitatively on the repeatability of wine scores from the same person.
This is a topic that I have commented on before, notably in my post on: The sources of wine quality-score variation. In that post, I briefly discussed a dataset from Rusty Gaffney, who re-tasted 21 Pinot noir wines 16-26 months after their first tasting.
Well, in the current case, we have a much shorter period of time than that; and this gives us a much better insight into the process of scoring wines, which is, after all, rather subjective. The August and September BKWine commentaries were published much later than usual, in the middle of the month rather than at the beginning. This presumably reflects pandemic-induced problems, which lead to what is presumably an unintended mix-up. The same person was responsible for all of the actual wines scores (Jack Jakobsson).
This graph shows us the scores for those 16 wines that were repeated in both the August and September wine commentaries. Note that the scores have a maximum of 20 points (not 100).
Only 4 of the 16 wines have the same score on both occasions; but 12 of them are within half a point (the smallest possible difference). However, 3 of the wines have a difference of 1 point; and 1 wine differs by 1.5 points. Nine of the wines had an increased score on the second occasion, while only 3 decreased. Circa 39% of the variation in scores is shared between the two occasions.
The differences in scores are somewhat disappointing. Although the similarities are much better than would be expected from random chance (p<0.01), we are still faced with a situation where the differences are slightly bigger than the similarities.
However, this situation is vastly better than the previous one that I reported (see above), where only 6% of the variation in scores was shared between the two occasions (which were much further apart in time, of course). Tastings close together in time are expected to be more consistent; so we at least get that.