Monday, October 31, 2016

Precision and accuracy of numbers — getting it right

Every day we are bombarded with numbers, usually from the media. Unfortunately, the people presenting these numbers often do not understand the relationship between the precision and the accuracy of their numbers, and so they are prone to mislead both themselves and their readers.

In my experience as a scientist, this is true even in the professional literature; and my recent experience in the professional literature of the wine world suggests that it is no different there, either. So, it is worth explaining this situation, to see if I can't encourage people to get it right.

The accuracy of a number refers to how close it is to the truth. If I claim that something costs \$10 when it actually costs \$15 dollars, then I am not being very accurate.

The precision of a number refers to how many digits I am using, or how many decimal places I present. If I claim that something costs \$10.11 rather than \$10, then I am being more precise (I've used four digits rather than two).

This distinction is often illustrated using the idea of shooting at a target, as shown above. Precision refers to how close together are repeated shots, while accuracy refers to how close the shots are to the center.

A problem occurs when the precision of any number is greater than its accuracy, because that will be misleading. For example, if I claim that something costs \$10.11 when it actually costs \$15, then the precision of my number (to the nearest cent) gives a spurious sense of accuracy (I am not even accurate to the dearest dollar). This is bad; and it can be easily avoided.

I can illustrate this using the following example from the recent wine literature. In this case, the data summarize some of the characteristics of 48 people who were sampled. When presented as percentages, the numbers cannot be more accurate than to the nearest 2% — after all, the only numbers possible are 0 people out of 48 = 0%, 1 / 48 = 2%, 2 / 48 = 4% .... 47 / 48 = 98%, 48 / 48 = 100%.

However, the numbers as presented in the paper were to the nearest 0.1%, which is 1 out of 1000 not 1 out of 48, as shown in the first table. The 60.4% actually refers to 29 out of 48 people, not to 604 out of 1000. This is misleading.

In this case, presenting the numbers to the nearest 1% (ie. dropping the decimal places) would be better, because the precision would more nearly represent the accuracy.

As an alternative example, the next table shows two different sample sizes, 136 and 50. A sample size of 136 may well justify an accuracy of one decimal place but not 2 such places; and a sample size of 50 probably does not justify even one decimal place. Just because we can calculate a number to many decimal places (lots of precision) does not mean that the accuracy justifies this.

These situations are easy to avoid — precision should simply never exceed the accuracy.

Note: I have not identified the authors of either of the examples illustrated here. I agree with Bjørn Andersen (in his book Methodological Errors in Medical Research) that we should not "pillory a few for errors which many commit with impunity".