Monday, January 13, 2020

Quality scores changed the wine industry, and created confusion

Towards the end of last year there was some more of the ongoing discussion about the pros and cons of wine-quality scores, especially when used as a marketing tool (involving Simon Solis-Cohen, Tom Wark, and Roger Morris).

It seems to me that there was something missing from the discussion, which I wish to highlight here. The fundamental problem with quality scores is that they are an opinion but they get treated as a piece of mathematics. Until this issue is resolved (which it may never be), confusion will continue (along with the discussion), which cannot be good for the wine industry.

If wine evaluation stayed as an opinion (as Tom Wark correctly puts it: A wine rating is an adjective, not a calculation) then, indeed, things would be alright — evaluations would function as a marketing tool, just like any other adjectival opinion. But they don’t stay that way — they implicitly and explicitly have mathematical operations performed on them, such as averaging, which never happen to real adjectives. There is a contradiction here, between what we should do (in theory) and what is happening (in practice).


Wine evaluation

At heart, the process of wine evaluation involves three characteristics, which may be only tenuously connected: (i) the physical wines, which can vary in many chemical ways, some of which are generally considered to be desirable; (ii) the physiology of the tasters, who may vary greatly in their ability to distinguish smells and tastes; and (iii) the psychology of the tasters, who may have very different wine preferences. The sticking point is the last one. We expect that experts can truly detect the wine differences (points i and ii), but can they tell us whether we will like them (point iii)? Note 1

Wine commentators use the 20-point and 100-point quality scales in an apparent attempt to be seen to be objective, rather than merely opinionated. They tell you their opinion on the chemistry of the wine based on their expert physiology, and then express that as a number. However, we can never justify the difference between a score of 89 and 91, let alone deduce from that number that we will like the wine enough to pay for the marketing difference between those two scores.

In that sense, the benefit to the wine business is very parochial — it makes the marketing easier but does not necessarily benefit the consumer. The score has a one-way effect. This is often the way with any marketing technique, of course, but we should not necessarily condone it (unless we are a marketer).

So, the relation between scores and marketing is a one-edged sword, which is why people disagree about scores being used as a marketing tool. Simon Solis-Cohen dislikes it (Wine scores are the worst marketing technique — so stop it!), while Tom Wark (Why reviews and wine scores ARE good winery marketing) and Roger Morris (How scores changed the wine industry — for the better) can see some benefits.

One thing is for sure: if a wine gets a high score from an accepted expert then its price will immediately go up a long way (see Bob Henry's example in my post How many 100-point scores do critics really give?). This is very effective marketing.


Wine scores

My point here is that if quality scores are adjectives then they should not be treated as numbers, because the only purpose of numbers is their mathematical properties, not their linguistic ones. I have discussed the rather bizarre mathematical properties of wine-quality scores in some of my previous posts (see the Wine scores link in the Labels For Posts list at the right of this page).

For my purpose here, the pertinent issue is the idea that the the numbers express more than merely rank order. We expect that a score of 90 is better than a score of 89 — their rank order should mean something. But we do not know how much better a 90 wine is than an 89, nor do we know what criteria were used to decide on this difference. So, the quantitative difference has no explicit meaning.

This becomes a problem when we try to compare the scores to other characteristics that are also expressed as numbers. The obvious one is price, which is always a very precise number (although it may be flexible from time to time, at the whim of the sellers). We anticipate that there will be some sort of reasonably strong relationship between score and price — as price goes up the score should also go up, and vice versa. I have illustrated this general phenomenon a number of times, many of them listed in my post on The relationship of price to wine-quality scores.

However, can we really do very much with this expectation? Can we evaluate whether a wine’s price is a rip-off (the price is way too high for the assigned quality score)? Can we evaluate whether a wine is a bargain (the price is pretty low for the score)? We can certainly try, but this does assume that the scores have more than merely rank-order meaning. In other words, it depends on their mathematical properties.

This sort of analysis is applied in many ways in the wine industry. For example, wine investors use this when proffering advice (see Fine wine investment: super anomalies). In this process, the investment advisor is looking for wines that are anomalous, in the sense that their price is very low for their assigned score, particularly wines with almost identical scores yet very different pricing. Anomalous wines are seen as a good investment (whether they are also a good drink is beside the point).

In this sort of procedure, there is a mathematical formula (or algorithm) that connects all of the numbers, whether they be scores, prices, aging factors, or whatever. The advisor is then a slave to this algorithm, which brooks no argument. Any formula combines numbers to produce a final number, and that number is THE answer. We will either invest or not. But what if one of the numbers (the score) has little in the way of mathematical usefulness? Surely the output must be nonsense. Where is our adjectival opinion now? It has disappeared in a flurry of calculating.

The problem with mathematics is that you can do almost anything with it. Maybe we should give points for peoples’ opinions about the usefulness of wine scores? If you want to see how ridiculous this sort of thing can get, turning words into numbers, then a tolerably well-known example is included below. Note 2

Scores may be good marketing, but they are bad mathematics. It is truly said that there cannot be too much data; but there can be too many opinions about those data. And there can be too many opinions that are not based on any data. Wine evaluation is subjective, not objective. So, why do we pretend otherwise? Marketing, of course!




Note 1.

Denton Marks has recently had this to say about wine quality scores (If this wine got 96 out of 100 points, what is wrong with me? A critique of wine ratings as psychophysical scaling. AAWE Working Paper No. 239, 2019):
Dispassionate expert wine evaluation to educate consumers might seem aimed at increasing market efficiency, since consumer ignorance likely inhibits wine market growth ... Considerable research has explored how ratings correlate with transaction prices, testing whether they help “explain” willingness to pay (WTP); that could suggest that they do indeed educate consumers.
While mixed results say that wine ratings are not necessarily reliable guides to wine quality and WTP, a more fundamental structural difficulty is that, as a form of hedonic quality index, they involve questionable interpersonal comparisons — say, between experts or between an expert and oneself. For example, different tasters may differ over a taste’s appeal (e.g., the existence of “supertasters”). Saying that experts can tell consumers what they will like so they can reliably determine relative enjoyment and willingness to pay is flawed logic.
Well-established critiques of hedonic scaling illuminate the difficulties and raise fundamental questions about the reliability of ratings — in effect, adopting someone else’s preferences as one’s own — and the interpretation of any price-rating correlation.



Note 2.

There is an old joke that periodically does the rounds of the internet. The earliest version that I know of dates from 2004 (What does it mean to give MORE than 100%?). It goes like this:

Ever wonder about those people who say they are giving more than 100%? We have all been to those meetings where someone wants you to give more than 100%. How about achieving 103%?

If:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
is represented as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Then:

H-A-R-D-W-O-R-K
8+1+18+4+23+15+18+11 = 98%

K-N-O-W-L-E-D-G-E
11+14+15+23+12+5+4+7+5 = 96%

A-T-T-I-T-U-D-E
1+20+20+9+20+21+4+5 = 100%

B-U-L-L-x-x-x-x
2+21+12+12+19+8+9+20 = 103%

x-x-x-K-I-S-S-I-N-G
1+19+19+11+9+19+19+9+14+7 = 127%

So, one can then conclude, with mathematical certainty, that while Hard Work and Knowledge will get you close, and Attitude will get you there, it is other things that will put you over the top.

7 comments:

  1. None of this is specific to wine. Every product category has the same issues. e.g., a 1% difference in rating can push a Yelp review to 3.5 stars (not going there) or up to 4.0 stars (seems like a good place, let's go).

    What's the difference between a 3.9 and 4.0 star running shoe on Amazon? Collectively billions of dollars in sales... even though something like a running shoe is at least as subjective as wine.

    And why do price and rating need to be highly correlated? They're not for non-wine products.

    It's not reasonable to look at a number as the singularity. It's just another data point that consumers can combine with all the other evaluation criteria (price, discount, region, wine style, sensory characteristics, backstory, appropriateness for consumption context, etc., etc.) to make purchase decisions.

    ReplyDelete
    Replies
    1. I agree that numerical scores of all sorts have the same issues. Amazon scores are averaged by the company's algorithm, but this seems to mean nothing mathematically, especially as one cannot give a score of 0.

      A single score has no practical use, and so the scores need to be compared. The problem is trying to compare their mathematical properties, since these are more complex than people seem to realize.

      If price does not correlate with score, then it is difficult to identify something as "a bargain". So, I think that such a correlation would be useful, in the practical sense. Indeed, it might be the most useful thing a score could do for us.

      Otherwise, a score is just an adjective, like all the other adjectives (style, sensory characteristics, etc). As such, it should be treated as such, as you say.

      Delete
    2. Scoring is contextual, even if it's not supposed to be. Otherwise, everyone should prefer a 5 star burrito over a 4 star Kobe steak.

      I don't know that there's a huge problem with the interpretation of ratings. a 92 is the same as a 93 and almost as good as a 94 or 95. A more significant dimension to the problem is that the sparsity of reviews means that consumers are exposed to ratings by different reviewers. So now we have to figure out why a wine has a Vinous 90, a Suckling 93, a Decanter 96 and a Luca Maroni 99 point score. Since there are no consensus scoring models that compensate for these biases, we don't even have a starting point!

      Delete
    3. My point is not that there is a problem with interpretation, but that there is a problem with comparing and combining scores as though score differences matter. My post on "Laube versus Suckling — their scores differ, but what does that mean for us?" discusses scores from different observers.

      Delete
    4. All we know is that Luca Maroni never goes below 98 - the world's most generous wine reviewer ;-)

      Delete
  2. It would be interesting to see if there are any types, styles, or regions of wine with reasonably close correlation between price and average score.

    ReplyDelete
    Replies
    1. Indeed so. I will see whether I have sufficient consistent data to do this for any region.

      Delete