How Victor Hugo Can Predict Ebola and Help a Business Succeed

[From my post on Harvard Business Review]

There’s no doubt that our world faces complex challenges, from a warming climate to violent uprisings to political instability to outbreaks of disease. The number of these crises currently unfolding – in combination with persistent economic uncertainty – has led many leaders to lament the rise of volatility, uncertainty, complexity, and ambiguity. Resilience and adaptability, it seems, are our only recourse.

But what if such destabilizing events could be predicted ahead of time? What actions could leaders take if early warning signs are easier to spot? Just this decade, we have finally reached the critical amount of data and computer power needed to create such tools.

“What is history? An echo of the past in the future,” wrote Victor Hugo in The Man Who Laughs. Although future events have unique circumstances, they typically follow familiar past patterns. Advances in computing, data storage, and data science algorithms allow those patterns to be seen.

A system whose development I’ve led over the past seven years harvests large-scale digital histories, encyclopedias, social and real-time media, and human web behavior to calculate real-time estimations of likelihoods of future events. Essentially, our system combines 150 years of New York Times articles, the entirety of Wikipedia, and millions of web searches and web pages to model the probability of potential outcomes against the context of specific conditions. The algorithm generalizes sequences of historical events extracted from these massive datasets, automatically trying all possible cause-effect combinations and finding statistical correlations.

For instance, recently my fellow data scientists and I developed algorithms that accurately predicted the first cholera outbreak in 130 years. The pattern that our system inferred was that cholera outbreaks in land-locked areas are more likely to occur following storms, especially when preceded by a long drought up to two years before. The pattern only occurs in countries with low GDP that have low concentration of water in the area. This is extremely surprising, as cholera is a water-born disease and one would expect it to happen in areas with a high water concentration. (One possible explanation might lie in how cholera infections are treated: if prompt dehydration treatment is supplied, cholera mortality rates drop from 50% to less that 1%. Therefore, it might be that in areas with enough clean water the epidemic did not break out.)

The implication of such predictions, automatically inferred by an-ever-updating statistical system, is that medical teams can be alerted as far as two years in advance that there’s a risk of a cholera epidemic in a specific location, and can send in clean water and save lives.

Other epidemics can be predicted in a similar way. Ebola is still rare enough that statistical patterns are tough to infer. Nevertheless, using human casualty knowledge mined from medical publications, in conjunction with recurring events, a prominent pattern for Ebola outbreaks does emerge.

Several publications have reported a connection between both the current and the previous Ebola outbreaks and fruit bats. But what causes the fruit bats to come into contact with humans?

The first Ebola outbreaks occurred in 1976 in Zaire and Sudan. A year before that, a volcano erupted in the area, leading many to look for gold and diamonds. Those actions caused deforestation. Our algorithm inferred, from encyclopedias and other databases, that deforestation causes animal migration – including the migration of fruit bats.

We have used the same approach to model the likelihood of outbreaks of violence. Our system predicted riots in Syria and Sudan, and their locations, by noticing that riots are more likely in non-democratic regions with growing GDPs yet low per-person income, when a previously subsidized product’s price is lifted, causing student riots and clashes with police.

The algorithm also predicted genocide by identifying that those events happen with higher probability if leaders or prominent people in the country dehumanize the minority, specifically when they refer to minority members as pests. One such example is the genocide in Rwanda. Years before 4,000 Tutsis were murdered in Kivumu, Hutu leaders such as Kivumu mayor Gregoire Ndahimana referred to the minority Tutsis as inyenzi (cockroaches). From this and other historical data, our algorithm inferred that genocide probability almost quadruples if: a) a person or a group describes a minority group (as defined by census and UN data) as either a non-mammal or as a disease-spreading animal, such as mice, and b) the speaker does so 3-5 years before they’ve been are reported in the news a minimum of few dozen times and have a local language Wikipedia entry about them.

After an empirical analysis of thousands of events happening in the last century, we’ve observed that our system identifies 30%-60% of upcoming events with 70%-90% accuracy. That’s no crystal ball. But it’s far, far better than what humans have had before.

What would it mean to NGOs, construction companies, and health organizations to know that droughts followed by storms can lead to cholera? What would it mean to mining companies, regulators, environmental organizations, and government leaders to know that mining leads to deforestation, and that deforestation leads to fruit bat migrations, and that fruit bat migrations may increase the risk of an Ebola outbreak? And what would we all do with the information that certain linguistic choices and policy changes can result in widespread violence? How might we all start thinking about risk differently?

Yes, “big data” and sophisticated analytics do allow companies to improve their profit margins considerably. But combining the knowledge obtained from mining millions of news articles, thousands of encyclopedia articles, and countless websites to provide a coherent, cause-and-effect analysis has much more potential than just increasing sales. It can allow us to automatically anticipate heretofore unpredictable crises, think more strategically about risk, and arm humanity with insight about the future based on lessons from the relevant past. It means we can do something about the volatility, uncertainty, complexity, and ambiguity surrounding us. And it means that the next time there’s a riot or an outbreak, leaders won’t be blindsided.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s