Monday, May 24, 2021

Is big data always needed in the wine industry?

We all know that there are plenty of things that a person can do easily but machines find more challenging. For example, when the police are directing traffic they often pick gaps between groups of cars to be the "stop" points, whereas arranging this behavior using traffic lights is much more complicated. We have all driven along a main road and been stopped at every single [expletive deleted] light, haven't we? (If you haven't, then just you wait ... your time will come.)

Well the same thing applies when looking at or using data, whether in academia or in industry. Sometimes, the analysis is way too complicated for the simple outcome, which anyone could easily have worked out for themselves, unaided.

Jon Carter —

Given my advanced age, obviously data analysis has changed a lot during my professional career, not least because computers became prevalent (yes, I just manage to pre-date personal computers), and their computational power began to be appreciated. So, a lot of new-fangled computational ideas were developed during my life-time, some of which I learned about, and used. One or two of them have appeared in this blog, as I have applied them to wine-industry data (eg. The study of grape-vine leaves is harder than you might think).

However, there are others that seemed to me to border on nonsense, in the sense that they applied very complicated mathematical ideas in ways that seemed to produce little of practical value. Looking at data analysis in the wine industry leads me to a similar feeling. So, I thought that I might mention it here.

Classic data analysis

Many of these analyses involve the use of what is called Big Data. We have all encountered small datasets, where someone does a survey of some sort, and we then wonder whether the results are worthwhile, because the sampling of people looks a bit restricted. The idea of collecting massive datasets is supposed to be to circumvent this issue. I therefore have nothing against big datasets, because they can be useful for many things. For example, Netflix uses this for its recommendation engine (see Netflix amped up recommendations with its own big data. What that means for wine), which seems to work quite well. YouTube, on the other hand, simply recommends the same videos to me over and over again, irrespective of whether I have already viewed them — they might like to check out what Netflix is doing.

However, Big Data for its own sake is not necessarily useful.

This comment was occasioned by a recent research paper:
Wineinformatics: Using the full power of the Computational Wine Wheel to understand 21st century Bordeaux wines from the reviews. Beverages 7: 3 (2021).
I am not criticizing this particular piece of research, on its own. To me, it is simply a classic example of doing something a bit odd — applying complex mathematical techniques to something that a human can do in a few seconds. This may be computationally interesting, but it has little practical value for the rest of us.

In this case, the objective is to get the computer to break down written wine reviews into their component parts: there 985 binary wine characteristics and 34 continuous characteristics, which are used to describe 14,349 wines. That is, the words become numbers, and can thus be processed mathematically. The claimed usefulness is "to build a model for wine grade category prediction".

Now, no-one is going to admit that they can write a professional wine note in 20 seconds, but every professional can work out what they are going to say about any given wine in pretty short order. Why do we need a big-data computer analysis, with a Naïve Bayes classification algorithm and a Supported Vector Machine to build a model for this? I guess we will find out, eventually.

Support Vector Machine structure

Mind you, humans themselves do not necessarily communicate well, even when using words; so there may well be something in the idea of reducing those words to numbers.

Take the recent example propounded by Jane Coaston (We need to have a national conversation about wine descriptions) and Esther Mobley (Can Pinot Grigio 'express' a concept? Here's why wine language drives me nuts).

The reported fuss is about this particular sentence:
“Lagaria wines express varietal character and terroir within a classic and modern concept.”
Various theories are proposed about how this allegedly strange sentence arose. It seems to me that the simplest explanation is that these two readers are not reading it right. It does not say "Pinot Grigio express a concept". It says that all things exist within some concept (as they must do), and that Pinot Grigio expresses its character within a modern one (as opposed to, say, an old-fashioned one). So, Pinot Grigio expresses varietal character — it does not express a concept. This seems to be a quite straightforward thing to say.

The problem is: if human beings can get so tied in knots with language, how are we ever going to train a computer to make sense of it?


  1. For readers of The Wall Street Journal (USA Edition), this weekend's newspaper published reviews of various books on "artificial intelligence" and "deep learning."

    Titled "They Think They're So Smart" by David A. Shaywitz (a physician-scientist who lectures at Harvard Medical School), it critiques these five books:

    Genius Makers by Cade Metz (Dutton, 370 pages, U.S. $28)

    A Brief History of Artificial Intelligence by Michael Woolridge (Flatiron, 262 pages U.S. $27.99)

    The Myth of Artificial Intelligence by Erik Larson (Belknap/Harvard, 312 pages, U.S. $29.95)

    Atlas of AI by Kate Crawford (Yale, 327 pages, U.S. $28)

    Futureproof by Kevin Roose (Random House, 217 pages, U.S. $27)

    The book review URL:

  2. David writes that "when looking at or using data, whether in academia or in industry. Sometimes, the analysis is way too complicated for the simple outcome, which anyone could easily have worked out for themselves, unaided."

    As Abraham Maslow observed in 1966 regarding his "Law of the Instrument":

    "I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail."

    Artificial intelligence and deep learning tools are not a panacea.