Monday, June 8, 2020

The study of grape-vine leaves is harder than you might think

A couple of weeks ago a new research manuscript appeared online:
Daniel H. Chitwood (2020) The shapes of wine and table grape leaves: an ampelometric study inspired by the methods of Pierre Galet.
This was recently highlighted on the American Association of Wine Economists’ Facebook page, which drew it to my attention.

I found this paper fascinating, because a whole swag of fancy data-analysis techniques were combined, in order to do something very challenging — study the variation in leaf shape among commercial grape varieties. This is of practical as well as theoretical importance, because it is this variation that has traditionally been used to identify the varieties.


I won’t bore you with the details, which really do require some expertise to understand. However, one thing did stand out to me as a bit of a worry. Figure 2 of the paper presents the results of an analysis that forces the data into a particular type of pattern, and this is not necessarily a good thing to do. There is a better alternative.

Having described the leaves of 60 grape varieties, in the manner illustrated by the red lines shown above, we can summarize the complex data by calculating a measure of the morphological “distance” between the varieties — a greater distance indicates less similarity between the varieties. This is what we want, because it is the distance that will help us study the leaf variation.

However, the analysis chosen in the paper to study these distances was what is called a cluster analysis, which aggregates the leaves into groups or clusters. This is risky, because we do not actually know that the leaves will fall into groups, in the first place. What if they don’t? We then have a result that is misleading (i.e. groups that do not exist).

We can examine this possibility by using a network analysis, instead, as described in my post Summarizing multi-dimensional wine data as graphs, Part 2: networks. This analysis makes no prior assumption about the existence of groups — if they exist then the analysis will find them, but if they don’t then it will show how complex are the non-group relationships.

My own network analysis of the distances, as provided in the paper, is shown in this graph.

Network analysis of grape-vine leaf shapes

The original cluster analysis found two main groups (I and II), with four outlying varieties that were in neither group. However, the network does not show us any clear groups, at all.

This does not mean that the network analysis finds fault with the cluster analysis, but merely that the cluster results are too simple — the grape varieties cannot be grouped so neatly.

I have marked the Group I varieties in red, and the Group II varieties in blue, with the outliers in black. These two groups are actually well represented in the network, as they aggregate at opposite ends of the graph. So, the cluster results are not surprising. If we are going to put the 60 varieties into two groups, then the two groups found by the cluster analysis are as good as any. The main fault, however, of the cluster analysis is that the relationships between the grape-vine leaves are more complex than this — there are any actually number of ways of clustering the varieties into groups, and the cluster analysis simply chooses one of the many.

The most obvious example of this problem is the Grenache variety, which the network analysis associates with Group I, not Group II. In the cluster analysis, Grenache is shown as the outlier in Group II, indicating that this analysis is equivocal. Unfortunately, the results of a cluster analysis cannot indicate equivocation. A network analysis, on the other hand, is specifically designed to show equivocation, if this is needed.

Equally interesting is that the network analysis shows that the four outlying varieties have, in fact, very little relationship to each other — that is, they do not appear together in the network. Burger and Chasselas do appear near each other in the graph, but the latter has a very long edge, indicating that its leaves are the most unusual within the collection of 60. Gewürtztraminer seems to be rather similar to White Riesling, while Zinfandel is associated with Gamay and Müller-Thurgau.

Conclusion about leaf shapes?

Leaf shapes have been an essential component of previous methods for the identifications of grape varieties, but they seem to be a bit more like an art than a science. This is why the modern world now uses DNA sequencing for that same purpose.

No comments:

Post a Comment