Well the same thing applies when looking at or using data, whether in academia or in industry. Sometimes, the analysis is way too complicated for the simple outcome, which anyone could easily have worked out for themselves, unaided.
Given my advanced age, obviously data analysis has changed a lot during my professional career, not least because computers became prevalent (yes, I just manage to pre-date personal computers), and their computational power began to be appreciated. So, a lot of new-fangled computational ideas were developed during my life-time, some of which I learned about, and used. One or two of them have appeared in this blog, as I have applied them to wine-industry data (eg. The study of grape-vine leaves is harder than you might think).
However, there are others that seemed to me to border on nonsense, in the sense that they applied very complicated mathematical ideas in ways that seemed to produce little of practical value. Looking at data analysis in the wine industry leads me to a similar feeling. So, I thought that I might mention it here.
Many of these analyses involve the use of what is called Big Data. We have all encountered small datasets, where someone does a survey of some sort, and we then wonder whether the results are worthwhile, because the sampling of people looks a bit restricted. The idea of collecting massive datasets is supposed to be to circumvent this issue. I therefore have nothing against big datasets, because they can be useful for many things. For example, Netflix uses this for its recommendation engine (see Netflix amped up recommendations with its own big data. What that means for wine), which seems to work quite well. YouTube, on the other hand, simply recommends the same videos to me over and over again, irrespective of whether I have already viewed them — they might like to check out what Netflix is doing.
However, Big Data for its own sake is not necessarily useful.
This comment was occasioned by a recent research paper:
Wineinformatics: Using the full power of the Computational Wine Wheel to understand 21st century Bordeaux wines from the reviews. Beverages 7: 3 (2021).I am not criticizing this particular piece of research, on its own. To me, it is simply a classic example of doing something a bit odd — applying complex mathematical techniques to something that a human can do in a few seconds. This may be computationally interesting, but it has little practical value for the rest of us.
In this case, the objective is to get the computer to break down written wine reviews into their component parts: there 985 binary wine characteristics and 34 continuous characteristics, which are used to describe 14,349 wines. That is, the words become numbers, and can thus be processed mathematically. The claimed usefulness is "to build a model for wine grade category prediction".
Now, no-one is going to admit that they can write a professional wine note in 20 seconds, but every professional can work out what they are going to say about any given wine in pretty short order. Why do we need a big-data computer analysis, with a Naïve Bayes classification algorithm and a Supported Vector Machine to build a model for this? I guess we will find out, eventually.
Mind you, humans themselves do not necessarily communicate well, even when using words; so there may well be something in the idea of reducing those words to numbers.
Take the recent example propounded by Jane Coaston (We need to have a national conversation about wine descriptions) and Esther Mobley (Can Pinot Grigio 'express' a concept? Here's why wine language drives me nuts).
The reported fuss is about this particular sentence:
“Lagaria wines express varietal character and terroir within a classic and modern concept.”Various theories are proposed about how this allegedly strange sentence arose. It seems to me that the simplest explanation is that these two readers are not reading it right. It does not say "Pinot Grigio express a concept". It says that all things exist within some concept (as they must do), and that Pinot Grigio expresses its character within a modern one (as opposed to, say, an old-fashioned one). So, Pinot Grigio expresses varietal character — it does not express a concept. This seems to be a quite straightforward thing to say.
The problem is: if human beings can get so tied in knots with language, how are we ever going to train a computer to make sense of it?