## Monday, February 6, 2017

### Misinterpreting statistical averages — we all do it

Looking at a mass of data is often confusing at best and daunting at worst. So, our natural tendency is to summarize the data in some way, thus reducing the mass down to something more manageable. This is usually considered to be A Good Thing, but it does run the risk of being misleading.

A summary must lose information, by definition (that's what "reduction" means). What if the lost information causes us to misinterpret the summary? A summary cannot be perfect, and so our interpretation cannot be perfect, either. We need to summarize the important part of the data, and not all summaries are created equal.

Perhaps the biggest potential problem occurs when we calculate an average, as our data summary. People then seem to focus very much on the average itself, and not on the variation of the original data around that average. This can lead to misinterpretations of the original data.

Here, I will use a few examples from the world of wine data to illustrate this point. In fact, I will show that apparent patterns in data can arise from changes in either the average or the variation, or both. We need to be aware of this in practice.

Introduction

Consider this first graph, which shows the differences in the average quality ratings of the wines from the different types of Bordeaux chateaux (first to fifth growths). We immediately see a pattern of decreasing average score as we proceed from the first to the third growths, which then do not differ much from the fourth and fifth growths.

Each point in the graph represents an average of the quality scores from a number of chateaux, but we cannot see the variation of these scores (we have only the averages). So, is the apparent decreasing pattern among the points caused by differences in average alone or differences in variation?

There are three ways that averages can show a decreasing pattern:
• all of the data points decrease
• the larger values become fewer
• the data become less variable
Obviously, the inverse of each of these must be true for increasing patterns. In the next three sections I will show an example of all three possibilities.

Increase in average because all of the data points increase

This next graph shows the pattern through time of the monetary worth of alcohol exports from Australia, from 1988 to 2017. The original data are the small black points, connected by a black line.

It is clear that there is variation within and between years, and indeed this variation increases through time. It is common to summarize this type of time pattern with a running average, as shown by the thick red line — and this helps "smooth out" the pattern by averaging adjacent groups of data points. This summary is simple to interpret in this case, because all of the data points follow roughly the same pattern — they increase through time.

In this case, the summary is not misleading. I could delete the black points and line, and the red line would still be a good representation of the original data. So, presenting only the data summary would a good way to simplify the data pattern, in this case.

Increase in average because the smaller values become fewer

Now lets look at a potentially misleading case.

This next graph shows the pattern through time of the vintage quality scores of Bordeaux red wine, from 1934 to 2010. Quality is measured on a 20-point scale. The original data are the blue squares, while the red line a a running average.

The data summary (running average) shows a general upward trend, which we might interpret as a general increase through time of the quality of Bordeaux red wines, particularly since the early 1970s.

But we would be wrong — that's not what the data show. The pattern in the data is that the lowest data points "disappear" as we move from left to right across the graph. This is emphasized by the added box in this next version of the graph — there are no data points within this box, but data points do occur immediately to the left of the box.

So, the original data show that there were no vintages with a score of less than 10 out of 20 from the 1970s onward, but there were quite a few such vintages before that. The higher average wine quality thus arises because the poor vintages no longer occur, not because the quality of the other vintages increases. Top quality vintages occurred before and after 1970, but poor quality vintages died out.

In this case, the summary is misleading. That is, the red line on its own would not be a good summary of the data. I cannot usefully delete the blue points and black line.

Increase in average because the data become more variable

Averages can mislead in another way, as well.

The next graph also concerns Bordeaux red wines. Each point represents a vintage (from 1940 to 1995), with the vintage quality score shown horizontally (scale 1-7) and the vintage volume (in hectoliters) shown vertically.

A data summary would show a general upward trend across the graph from left to right, which we might interpret as a general increase in production as wine quality increases. This particular interpretation has certainly appeared in the literature on wine production.

But we would be wrong — that's not what the data show. The pattern in the data is that for vintages with low quality there is little wine production, but for high-quality vintages the production volume can vary dramatically. This is emphasized by the added line in this next version of the graph — there are few data points above the line, which would represent poor-quality vintages with a big production.

So, we rarely get big production from poor-quality vintages. This means that the apparent pattern (that there is higher average wine production as vintage quality increases) occurs because high-quality vintages can be associated with big production but poor-quality vintages cannot. Vintage production does not increase with vintage quality. Instead, variation in production increases with vintage quality — production may be big or small when the quality is high, but it is usually small when quality is low.

In this case, the summary would be misleading. It is thus a good thing that no summary line is shown on the graph.

Conclusion

Any time we are looking at a data summary, we need to bear in mind that apparent patterns in that summary can be caused by any one of three underlying patterns in the data, as illustrated above. These different causes lead to different interpretations of the summary. Be wary of data summaries when you see them, unless you can also see the original data, as well.

Data sources

First graph:
Gary M. Thompson, Stephen A. Mutkoski, Youngran Bae, Liliana Ielacqua, Se B. Oh (2008) An analysis of Bordeaux wine ratings, 1970-2005: implications for the existing classification of the Médoc and Graves. Cornell Hospitality Report 8(11): 6-17.

Second graph:
Wikimedia commons

Third pair of graphs is modified from:
Pablo Almaraz (2015) Bordeaux wine quality and climate fluctuations during the last century: changing temperatures and changing industry. Climate Research 64: 187-199.

Fourth pair of graphs is modified from:
Gregory V. Jones, Robert E. Davis (2000) Climate influences on grapevine phenology, grape composition, and wine production and quality for Bordeaux, France. American Journal of Enology and Viticulture 51: 249-261.

#### 1 comment:

1. "Robinson uses only six scores for Grange vintages (16.5-19) ..."

Excerpts from Jancis Robinson, MW Website
(circa 2002):

“How to Score Wine”

. . .

I would be much happier in my professional life if I were never required to assign a score to a wine. I know so well how subjective the whole business of wine appreciation is and, perhaps more importantly, how much the same wine can change from bottle to bottle and week to week, if not day to day. I frequently find myself re-tasting a wine at the same stage in its life. So far I have rarely marked more than 0.5 points out of 20 differently on the two occasions, but it wouldn't surprise me at all if I did.
And as for tasting the same wine at different stages in its life, this is even less likely to yield identical scores. Quite apart from bottle variation there are differences in tasters' moods and vast differences in how wines mature in bottle.

Even I have to admit, however, that scores have their uses. The most obvious is to help the reader-in-a-hurry . . .

I like the five-star system used by Michael Broadbent and Decanter magazine. Wines that taste wonderful now get five stars. Those that will be great may be given three stars with two in brackets for their potential. But Brits being as polite, or just plain cowardly, as we are, almost all the wines get between three and five stars in Decanter so it's not an especially nuanced scoring system -- although I have been known to use it for wines likely to be very close together in quality such as de luxe Champagnes or mature vintage Ports.

When even I have to admit that I really need a numerical scoring system is when tasting a wide range of wines of the same sort when readers, or subscribers to jancisrobinson.com, need a shorthand reference to my favourite wines. . . .

I know that Americans are used to points out of 100 from their school system so that now they, and an increasing number of wine drinkers around the world, use points out of 100 to assess wines. Like many Brits, I find this system difficult to cope with, having no cultural reference for it.

So, I limp along with points and half-points out of 20, which means that the great majority of wines (though by no means all) are scored somewhere between 15 and 18.5, which admittedly gives me only eight possible scores for non-exceptional wines -- an improvement on the five star system but not much of one. (I try when tasting young wines to give a likely period when the wine will be drinking best, so I do cover the aspect of its potential for development.)

But, perhaps strangely for someone who studied mathematics at Oxford, I'm not a great fan of the conjunction of numbers and wine. Once numbers are involved, it is all too easy to reduce wine to a financial commodity rather than keep its precious status as a uniquely stimulating source of sensual pleasure and conviviality.