The Wine Gourd: Are community wine scores what they seem?

Last week, a scientific paper was published (Crowdsourcing the assessment of wine quality: Vivino ratings, professional critics, and the weather) that generated some wine-media attention (Vivino’s crowd reviews gain credibility in Cambridge study). As a scientist, the paper raises some questions in my mind, and I will discuss them here. I am not claiming that there is necessarily anything wrong, but merely some things that I think are worth looking into.

The published paper basically touts the idea that community wine scores are a valuable source of information regarding wine quality. It does this as follows: “We assess the validity of aggregated Vivino ratings based on two criteria: correlation with professional critics’ ratings and sensitivity to weather conditions affecting the quality of grapes”. This is pretty straightforward, and not necessarily problematic.

The issue that I have is that community scores are not necessarily what they seem to be at face value — they can be biased by the people providing them, perhaps deliberately. That is, the scores and reviews provided on sites like Amazon and eBay are well known for being fake and / or biased (see below). So, why not community wine scores?

My point here is NOT that the wine scores are necessarily “gamed” in any way, but simply that the authors of the published paper appear to have made no attempt whatsoever to assess them — they have simply taken the scores at face value. This is very naive (in spite of 6 authors). The authors have perhaps assumed that the sheer volume of scores will overcome any issues of bias (39,035 ratings for 371 wines ≈ 105 scores per wine). This may actually be true, but it is still a naive assumption.

So, the authors are dealing with “crowdsourced ratings from large communities of wine consumers on platforms such as Vivino and Cellartracker” ... and Wine-Searcher”. The authors make only these sorts of comments about the scores: that the provision of the scores “potentially creating the conditions for crowd wisdom to be accrued”, and “the judgment errors of different individuals tend to cancel each other out”.

However, there is no reason to think that any crowd-sourced scores are any better than any others until they are shown to be so. As Tom Cannavan has explained:

The current thinking seems to be that the “wisdom of crowds” (CellarTracker, TripAdvisor, Amazon Reviews) is the most reliable way to judge something; but that thinking is deeply flawed.

The authors focus on the subjectivity of the scores, not on their possible systematic and deliberate bias by the people providing them — “using consensus as the sole arbiter for evaluating the validity of a judgment”. The providers of the scores are not selling wines directly, but their scores could be worth biasing, because people will use them to make their own wine-buying decisions, and therefore the ultimate sellers will benefit if their scores are systematically raised.

The authors note that the: “Vivino ratings correlate substantially with those of professional critics, but these correlations are smaller than those among professional critics. This difference can be partly attributed to differences in scope: Whereas amateurs focus on immediate pleasure, professionals gauge the wine’s potential once it has matured.”

So, the authors do come up with one main explanation for the discrepancy between the community scores and the professional scores. My point is that it could also be partly attributed to biased community scores, in some way, and the authors have not addressed this possibility.

Indeed, the highest correlation that the authors find for the community scores is r=0.50 (with Wine Advocate) which is only 25% of similarity. However, the authors consider this to be “substantial”. What about the other 75%, eh? This is not “substantial”.

Anyone who has doubts about the issue I am raising here does not have to spend long investigating the reviews provided on sales sites like Amazon and eBay. Amazon had a commitment to authentic reviews from the start, but it has been plagued by fake review brokers (i.e. people who will provide the fake reviews for you, if you pay them); and has been involved in lawsuits to deal with these people:
    Inside the underground market for fake Amazon reviews
    How Amazon takes action to stop fake reviews
    Amazon and Google file parallel lawsuits against a fake review website

On auction sites, the basic problem is shill bidding rather than fake reviews:
    How eBay's review system is promoting fake, counterfeit and even dangerous products

There are also plenty of Youtube videos discussing the issue over the past few years, and telling you how to spot the fake reviews:
    Amazon and the problem of fake reviews (Financial Times)
    Why Amazon has a fake review problem (CNBC)
    Fake Amazon reviews are more prevalent than you think (CTV News)

I myself have discussed, in quite a few posts in this blog, the issues of bias and subjectivity with wine quality scores (note in particular #6 in the list):
    Biases in wine quality scores
    Are there biases in community wine-quality scores?
    Are there biases in wine quality scores from semi-professionals?
    Are the quality scores from repeat tastings correlated? Sometimes!
    Awarding 90 quality points instead of 89
    CellarTracker wine scores are not impartial
    Do community wine-quality scores converge to the middle ground?
    Why comparing wine-quality scores might make no sense
    Are wine scores from different reviewers correlated with each other?
    How bad are wine scores, really?
    Be wary of "Second Chance Offers" on eBay

If I had evaluated this paper (all professional research papers are evaluated by at least one expert before publication) then I would have sent the authors back for a re-write. One possible way forward might be to compare the Vivino, Cellartracker and Wine-Searcher scores for systematic differences and similarities.

As a final note, I have pointed out that I rely on professional reviews for my own wine purchases, as described in this post:
Calculating value for money wines
I have, however, also bought some quite nice older wines (from 1945—2000) on eBay, in the past. Buyer beware!

Monday, January 13, 2025

Are community wine scores what they seem?

No comments:

Post a Comment

Get new posts by email: