Monday, February 26, 2018

Social bots and the problems they create for wine marketing

In the 21st century, anyone involved in advertising or selling products needs to be savvy with regard to social media; and this includes the wine industry. Contact with customers via social media (Facebook, Twitter, Instagram, Pinterest, blogs, etc) has not replaced traditional forms of contact (shops, tasting rooms, print reviews, etc), but it is definitely a major new form of interaction.

Take this 2010 quote from Patrick Goldstein:
Virtually every survey has shown that younger audiences have zero interest in critics. They take their cues for what movies to see from their peers, making decisions based on the buzz they've heard on Facebook, Twitter or some other form of social networking.

There have many discussions of social media and its commercial use, both in and outside the wine industry; and much of this is overtly enthusiastic and uncritical. We are simply told that the future is already here, with the availability of Big Data. If nothing else, we are told, we may be able to avoid the seemingly endless layers of "middle men" standing between the producer and the customer.

In an attempt to provide a somewhat more temperate discussion, I have already provided one blog post about the limitations of social media in the wine industry (The dangers of over-interpreting Big Data); and I have also noted that community wine-quality scores are no more impartial than are scores from individuals (Are there biases in community wine-quality scores?).

Here, as another sobering thought, I discuss a further issue that seems to me to be of importance, but which has not received much obvious attention, at least in the wine industry. This is the matter of what are known as social robots, or usually just "bots" for short. Like all human developments, bots can be exploited as well as used responsibly; and we need to understand the consequences of their possible misuses, if we are going to use social media effectively in the wine industry.


Bots have existed since the beginning of computing. They are simply computer programs that were originally developed to take care of the computer house-keeping when the volume (or speed) of activity gets too much for humans to handle.

To this end, they have increased dramatically in number since the advent of the internet. For example, the most prevalent of the so-called "good bots" are the web crawlers and scanners — every web search engine (Google, Yahoo, Bing, DuckDuckGo, Yandex, etc) has a mass of bots crawling the web, gathering data for use in the databased indexes that make speedy web searches possible.

Social robots, on the other hand, operate in the social media, and therefore potentially interact directly with human beings. The good news is that they can address some of the potentially overwhelming aspects of dealing with Big Data (ie. thousands of Facebook pages, tens of thousands of Instagram pictures, millions of Twitter tweets, etc). Let's start with a couple of obvious examples of potentially useful social bots, just to set the scene:
  • trading bots are involved in the automatic buying and selling of investment stocks, shares and cryptocurrencies — see Nathan Reiff (December 2017)
  • bots are also involved in the automatic buying of online entertainment tickets — see Donna Fuscaldo (March 2017).
Unfortunately, on the other side of the coin we have the so-called "bad bots", which can seriously disrupt human activities. For example, it has been suggested (along with considerable evidence) that the erratic price of cryptocurrencies in recent times has, at least partly, been manipulated by the activities of certain trading bots (eg. those named Spoofy and Picasso) — see Brian Yahn (January 2018).

Bots have, of course, also become prevalent in the world of blogs, Facebook, Instagram and Twitter, and their ilk; and here we potentially have widespread problems. In particular, these bots can wreak havoc with any attempts to make use of social media data for economic purposes. The Big Data ends up being massively misleading, because the web metrics being measured are inflated by the bots' activities, in unpredictable ways. I have discussed the important issue of such biased data before (Why do people get hung up about sample size?).

Problems with bots

According to the 2016 Imperva Incapsula Bot Traffic Report, c. 48% of web traffic is by humans, 23% is by good bots, and 29% is by bad bots; so we are not talking about a small problem. To look at the activity of some of the bad bots, let's take blogs first.


I first became aware of the scourge of bots with my professional blog, The Genealogical World of Phylogenetic Networks. I used to keep track of the number of visitors to that site, but this has now become a worthless activity, simply because of the number of bots that make visits to the blog's pages. When I see 2,000 visits from the Ukraine in a couple of days, I know that I am not seeing visits from large numbers of English-speaking Ukrainian scientists! Instead, I am seeing referral spam, from so-called "spam bots". Even the blog you are currently reading is prone to get 200 visits from Russia on some days.

These bots are trying to create referral traffic from other web sites, so that Google and similar search engines will record their visits, and thus increase the Page Rank of the referring site. From the point of view of the blogger, these spambot visits completely distort the blog's Analytics Referral Data, which is one measure of the success of the blog as part of the world's social media network.

In other words, these bots stuff up the Big Data. According to the above-mentioned Bot Traffic Report, bot traffic depends on the "size" of the visited site, in terms of the number of daily visits from humans. For my professional blog, the report estimates that visits are likely to be about 25% good bots and 45% bad bots. I would not disagree with those estimates — that is, only 30% of the visits are from actual human beings, who might be reading the blog posts. Sadly, it seems to be impossible to get rid of bot traffic from Blogger blogs.

This is a similar (but distinct) issue to that faced by marketers when bots generate "clicks" on online ads, thus artificially increasing traffic to the advertised site. One third of all digital advertising is suspected to be fraudulent, in this sense. For more information, see: A 'crisis' in online ads; and Google issuing refunds to advertisers over fake traffic.


Moving on to Facebook, there are thousands of bots; see Facebook Messenger has 11K bots: should you care? Most of these are so-called "chat bots", which are supposed to function as some sort of personal assistant for users, helping to gather information (eg. aggregate content from various online sources, such as news feeds) and/or conduct e-commerce transactions. They try to keep the users interacting within the Facebook environment, rather than having them leave to use another computer program (eg. in order to access content or conduct transactions).

This is all well and good, but what about the bad bots? These have become very obvious over the past half-dozen years, and they try to emulate the behavior of humans, and possibly alter the humans' behavior. They do this through the use of fake identities within the social media world, which is rapidly becoming big news.

Several studies of social bot infiltration of Facebook (eg. Krombholz et al., Fake identities in social media: a case study on the sustainability of the Facebook business model) have shown that more than 20% of legitimate users will accept "friendship" requests indiscriminately, and that more than 60% will automatically accept requests from accounts with at least one contact in common. This makes it very easy to use fake identities for any purpose whatsoever, including the false appearance of social media popularity and influence.

It has therefore been obvious for some years that people have been Buying followers on social media sites. As noted above, this completely alters the social media analytics, and reduces the usefulness of the Big Data. The extent of this problem was recently discussed in The New York Times (The follower factory), which noted:
The Times reviewed business and court records showing that Devumi [a well-known "follower factory"] has more than 200,000 customers, including reality television stars, professional athletes, comedians, TED speakers, pastors and models. In most cases, the records show, they purchased their own followers. In others, their employees, agents, public relations companies, family members or friends did the buying.
It matters not how many Facebook friends you have for your winery or wines — instead, we must ask: how many of them are real? Facebook "likes" may not be worth much, any more. Sophisticated bots can create personas that appear to be very credible followers, and they thus are very hard for both people and automated filtering algorithms to detect (see Varol et al., Online human-bot interactions: detection, estimation, and characterization).


Moving on to Twitter now, it has been observed that Twitter is an ideal environment for bots. Early social media bots were mainly designed for the automatic posting of content, and Twitter is the most effective place for that; see Twitter may have 45 million bots on its hands. Estimates of the number of bots on Twitter are 10-15% of the accounts.

So, in addition to the fake-identity problem outlined above, Twitter has an extra, very large problem — the rapid spread of misleading information (see Shao et al., The spread of fake news by social bots). As Ferrara et al. (The rise of social bots) have noted:
These bots mislead, exploit, and manipulate social media discourse with rumors, spam, malware, misinformation, slander, or even just noise. This may result in several levels of damage to society.
It is obvious that emotions are contagious in the social media; and Twitter bots seem to be particularly active in the early spread of viral claims, hoaxes, click-bait headlines, and fabricated reports. A recent article from the Media Insight Project discusses How millennials get news: inside the habits of America’s first digital generation, and it is now clear that the social media are of prime importance. So, the contagious spread of emotive false news is a really big issue.

As an aside, it is worth pointing out that this Twitter phenomenon is not actually new, it is simply magnified these days. In the old days, it was the internet newsgroups that were the primary online mechanism for spreading commentary. One classic example of their effect was the furore that arose over the 1994 release of the original flawed Intel Pentium microprocessor (see The Pentium Chip story: an internet learning experience). Intel did not anticipate the speed of the news spread, nor deal with it effectively.

However, our primary concern in this blog post is with the serious alteration of social media analytics that comes from the presence of Twitter bots. What worth is the following of your wine or winery on Twitter? How much of it comes from automated accounts? Once again, the use of Big Data becomes problematic when we cannot rely on its authenticity.


Instagram is apparently the favorite social media of many wine professionals. However, this post is already long enough, so I will skip any detailed discussion here. Instead, you can read: How bots are inflating Instagram egos. The issue is the basically the same thing I have been discussing — biased metrics arise from inflated "likes" created by bad bots.


As far as the adoption of social media is concerned, I feel that we are still being given the hard sell. To take an analogy, it is like we are being told to buy "a quality used car" — but what sort of quality? Good quality or poor quality? High quality or low quality? Everything has some sort of quality!

We need to think critically about both the pros and the cons of the social media and its associated Big Data. Enthusiasm is all very well, in its place, but it cannot substitute for careful thought about how we use social media in the wine industry. I don't think that the social media gurus have come to terms with bots yet, in terms of analyzing Big Data. What use is Big Data that have been biased by the behavior of bots?

In particular, it seems that the most important practical role for social media is that it can help publicize the existence of companies and their products or services. This makes it an information channel; but this does not necessarily make it a sales channel. We need to keep these two ideas distinct. Bots are not necessarily a problem for the mere advertising of a product, because we do not need to measure web metrics, which they can distort. But selling is a different matter, because we need to assess how effective is the reach of social media, in terms of successful sales (see Social media’s disappointing numbers game). Here, bots are potentially a serious problem.


A number of the ideas here, and some of the information, came from discussions with Bob Henry, who also directed me to some of the online literature.

Monday, February 19, 2018

Wine-quality scores for premium wines are not consistent through time

When dealing with professional wine-quality scores, the usual attitude seems to be: "one wine, one score". We have all seen wine retailers where, for each wine, only one quality score is advertised from each well-known wine critic or magazine. This is often either the most recent score that has been provided, or it is the highest score that has been given to that particular wine.

However, we all know that this is overly simplistic. The score assigned to a wine by any given taster can vary through time for one or more of several reasons, including: bottle variation, tasting conditions, personal vagaries, and the age of the wine. So, one score is actually of little practical use, even though that is usually all we get from retailers.

The point about the age of the wine is of particular interest to wine lovers, since there is a perception that premium wines should increase in quality though time (that's why we cellar the wine), before descending slowly to a mature old age (the wine, as well as us). It is therefore of interest to find out whether this is actually so. When wine critics repeatedly taste the same vintage of the same wine, do their assigned quality scores show any particular pattern through time? Or do they correctly assess the wine when it is young, so that it continues to get the same score as it matures?

This turns out not to be an easy question to answer, because in very few cases do critics taste a single wine often enough for us to be able to get a worthwhile answer; and when they do do repeat tastings, they do not always publish all of the results. I have previously looked at the issue of repeated tastings by comparing pairs of tastings for several wines (Are the quality scores from repeat tastings correlated?), but I have not looked at single wines through time.

Some data

So, I have searched around, and found as many examples as I can find of situations where a single critic has publicized scores for the same wine (single winery and vintage) at least six different times since 2003. I got my data from CellarTracker, WineSearcher and 90Plus Wines (as described in a previous post)

It turns out that very few people have provided quality scores for more than five repeats of any one wine (who can afford to?). It also turns out that the most likely place to find such scores is among the icon wines from the top Bordeaux châteaux. The critics I found are: Jeff Leve (27 wines), Richard Jennings (3 wines), Jancis Robinson (2 wines) and Jean-Marc Quarin (1 wine).

The graphs are tucked away at the bottom of this post, and I will simply summarize here what they show. They all show roughly the same thing: a lot of variation in scores through time, with a spread of points for any one wine never being less than 2; and the scores generally show a slight decrease through time.

The first four graphs are from Jeff Leve (at the Wine Cellar Insider). The first graph is for seven vintages of Château Latour. The scores generally stay within 2-3 points for each wine; and only the 1990 could be considered to show any sort of increase in score through time. The second graph is for Château Lafite-Rothschild, Château Mouton-Rothschild and Pétrus — the first two generally stay within 2 points, but the latter is all over the place. The third graph covers seven vintages of Château Margaux, which rarely stay within 2 points, and the 2000 vintage shows a strong decrease in score through time. The fourth graph covers nine vintages of Château Haut-Brion. The scores often do not stay within 2 points, especially for the 1961 vintage; and only the 1998 vintage increases slightly through time.

The fifth graph is for Richard Jennings (from RJ on Wine). All three of the vintages covered show a decrease in score through time. Finally, the sixth graph shows a couple of wines of Château Latour from Jancis Robinson and one from Jean-Marc Quarin, both of whom use a 20-point quality scale. Their scores range by at least 2 points per wine; and Quarin's wine strongly decreases in score through time.


I think that it might be stretching a point to claim that any of these wines show a consistent score through time — they go up and down by at least 2 points, and often more. We certainly can't claim that the scores increase with repeated tastings — if anything, the general trend is more often downwards.

There are a couple of possible explanations for this variation, in addition to the obvious one that the critics don't have much idea what they doing.

The classic explanation is "bottle variation" (rather than "taster variation"). For example, Robert Parker once wrote (Wine Advocate #205, March 2013): "I had this wine four different times, rating it between 88 and 92, so some bottle variation (or was it my palate?) seems at play." Parker's results would fit perfectly into the graphs below. As confirmation of this point, the widely reported 2010 results of the Australian Wine Research Institute’s Closure Trial certainly indicated a very large amount of bottle variation for cork-closed bottles (see Wine Spectator, Wine Lovers).

If this is the explanation, then the consistently erratic nature of the results, and the expected high quality of the wines, does make me wonder about the advisability of buying expensive wines. Huge bottle variation for cheap wines might be expected, but cannot be acceptable for the supposedly good stuff, even if only for financial reasons. This topic is discussed in more detail by, among others, Wilfred van Gorp, Jamie Goode, and Bear Dalton.

At the extreme, bottle variation can refer to flawed wines, of course. In the graph for Richard Jennings, one of the scores for Château Haut-Brion is missing, because he scored it as "flawed". Indeed, he did this for 3 of the 188 Grand Cru wines for which he provided scores (1.6%). James Laube estimates the rate of flawed wine as 3-4%. The other tasters may also have encountered flawed wines, but not reported this, as recently discussed by Oliver Styles.

Another point is the extent to which the tasters may have taken into account how old the wine was at the time they tasted it. If the wines are not tasted blind, then this always remains a strong question mark regarding the quality scores assigned.

Anyway, there is certainly a lot of leeway for retailers to select the score(s) they report on their shelf talkers and web pages. The Wine Searcher database addresses this issue by simply reporting the most recent score available.


Jeff Leve:

Jeff Leve's scores for Château Latour

Jeff Leve's scores for the Rothschilds and Pétrus

Jeff Leve's scores for Château Margaux

Jeff Leve's scores for Château Haut-Brion

Richard Jennings:

Richard Jennings' scores

Jancis Robinson and Jean-Marc Quarin:

Scores from Jancis Robinson and Jean-Marc Quarin

Monday, February 12, 2018

California grapes: quantity versus quality

Grape production is a balancing act between quantity and quality — producing a greater quantity is usually assumed to result in a reduction in quality. Therefore, attempts by grape growers to increase grape quality are usually associated with trying to decrease quantity. The relationship works both ways.

It is therefore of interest to look at the big picture of this supposed relationship. This quality/quantity relationship is usually investigated only at the micro level — for example, individual growers might decide to increase their grape quality by thinning their crop. But what happens at the macro level, across all growers? This question seems rarely to have been asked.

One simple way to start looking at this topic is to compare the production area of particular grape types with the amount of fruit they produce. We might anticipate that the highest quality varieties produce less fruit than do lower quality varieties. This is a simplistic approach, of course, because there are many factors that affect fruit production, most notably the weather; but if we restrict ourselves to a particular viticultural area, then it might be a useful place to start.

So, I decided to look at the California grape data provided by the United States Department of Agriculture (USDA). The latest report is from April 2017, which shows the acreage of productive vines (in each US state), for both red varieties and white varieties. I then compared these data to the data for the 2016 California grape crush provided by the American Association of Wine Economists (AAWE), for the top reds and top whites.

The data are shown in the two graphs, one for each type of grape. Within each graph, each point represents a single grape variety in California, showing its bearing acreage horizontally and its grape crush vertically. The lines on the graphs are best-fit linear regressions, illustrating the "average" production expected from each variety based on its acreage. In both cases the lines fit the data quite well, explaining c. 85% of the variation in the data.

California red grapes by area and crush

The first graph shows the data for the red varieties, where Cabernet sauvignon is by far the most widely planted grape variety, as well as the one most highly esteemed by winemakers. I therefore calculated the regression line, as shown, without including this variety, so that the line is fitted only to the other varieties — this then tells us what production to "expect" from Cabernet, based on the observed data for the other varieties. We can see that, as anticipated for the top variety, Cabernet sauvignon produces a much smaller crop than do the other varieties.

Interestingly, both Zinfandel and Rubired (labeled on the graph) produce a larger grape quantity than we might expect from their acreage, whereas all of the other varieties are close to their expectation. This is notable because Zinfandel is the second most widely planted red grape, and it is usually considered to also be a premium variety. Other common premium varieties, such as Pinot Noir, Merlot and Syrah, produce crops at about their expected level in California.

California white grapes by area and crush

A similar pattern is seen when we look at the white grape varieties, as shown in the second graph. Indeed, the regression lines in both graphs have almost the same slope (and intercept), indicating that red and white production both have the same relationship to area.

Chardonnay is both the most widely planted white grape variety and the one most highly esteemed by winemakers. It is obvious from the graph that Chardonnay produces less quantity than is expected based on the other white varieties, as anticipated.

Interestingly, three of the next four most widely planted white varieties (labeled on the graph) produce a larger grape quantity than we might expect from their acreage, whereas all of the other varieties are close to their expectation. This matches the pattern observed for the red varieties, where only the top variety has a reduced crop.

Finally, the California Department of Food and Agriculture's California Grape Crush Report Preliminary 2017 allows us to look at the broad-scale economics of wine-grape production. The graph below shows the inflation-adjusted price per ton of wine grapes (vertically) versus the grape crush tonnage (horizontally). Each point represents the crop for one year from 1989 to 2016, inclusive.

Price per ton of California grapes through 20 years

As can be seen, for the white-wine grapes their price is unrelated to the crop size — prices do not go up or down when the crop is large. On the other hand, for the red-wine grapes the price has a tendency (ie. with a few exceptions) to rise when the crop is large (correlation = 33%). In both cases, price is not related to scarcity, which is the important point. This implies that voluntarily restricting crop size does not affect the overall economics — the reduction in crop is likely to be compensated by increased price.


So, Cabernet sauvignon and Chardonnay are the most widely planted and most esteemed red and white grape varieties, respectively, in California, and they both produce smaller crops than might be expected based on the production levels of other varieties. Furthermore, the situation differs for some of the other widely planted varieties, which produce larger crops than might be expected. This seems to match what is anticipated from the suggested relationship between quantity and quality — quantity is less when quality is at its very highest. For California grapes, less is more.

Monday, February 5, 2018

Where does all of this wine come from and go to?

A few weeks ago I commented on some of those countries that are importing expensive versus cheap wine (The USA imports more expensive wines than anywhere else). This leads us inevitably to consider, globally, where all the wines are coming from and going to.

The data I will use to explore this come from Comtrade, the United Nations International Trade Statistics Database. I accessed all of the data available for 2016 in the category: "Wine; still, in containers holding more than 2 litres" (code 22042). This may include pretty much anything (bulk or otherwise), except import/export of single bottles of wine, but excludes sparkling or fortified wines.

I have plotted the results in the graph, which shows the total reported exports (in kg, which ≈ liters) horizontally, and the total reported imports vertically, with each point representing a single country (as recognized by the UN). Some of the countries are labeled, but most are not. Note that both axes have logarithmic scales, so that the most active countries are dealing with up to 1 million tons of wine annually.

Exports and imports of wine by country

For those countries above the line, their imports exceed their exports, while for those below the line, exports exceed imports. Obviously, most countries are net importers of wine. For the the USA, imports exceed exports by c. 50%.

Those countries that are large net exporters of wine are well known, including Spain, Chile, Italy, Australia and South Africa. France is not in this list because, according to the data, it imports nearly three times as much wine as it exports. Portugal is another absentee, as it imports more than twice as much wine as it exports. I discussed these two issues in the previous post (The USA imports more expensive wines than anywhere else).

The next group of net exporters includes (in order) Moldova, New Zealand, Macedonia, Myanmar (Burma) and Argentina, followed by Hungary, Israel, Morocco and Bulgaria. For Myanmar 96% of the wine goes to Suriname, and for Morocco 89% goes to France, which is why you have never tasted either of these wines. Macedonian wine principally goes to Germany (41%) and Serbia (34%), while Hungarian wine goes to Germany (30%) and Czechia (23%). The USA takes the largest share of the Israeli wine (46%), although France (11%) and the UK (10%) get their share, as well. Moldova sends its wines to mainly to Belarus (40%) and the Ukraine (19%), while Bulgarian wine goes to Poland (56%) and Sweden (25%).

The biggest net importers are generally (but not always) those countries with large populations but with only a relatively small wine industry: Germany, the United Kingdom, France, Russia and China — that's right, France is the third biggest net importer of wine in the world! These countries are followed by (in order) Iceland, Sweden, Czechia, USA, Belgium, the Netherlands and Switzerland.

I have commented before that there seems to be a number of countries that are credited with exporting far more wine than they actually produce (Bizarre wine data). In the Comtrade dataset, these countries include: Denmark, Finland, Belgium, the Netherlands, Sweden, Thailand, Norway, Luxembourg, Singapore, Hong Kong and Iceland. Normally, I would conclude that these data involve re-exports of imported wine; however, "re-export" is officially a separate category of data in the Comtrade database. It therefore seems to me that the data might not be organized as well as we would like.