[From my post on Harvard Business Review]
The White House recently released a report about the danger of big data in our lives. Its main focus was the same old topic of how it can hurt customer privacy. The Federal Trade Commission and National Telecommunications and Information Administration have also expressed concerns about consumer privacy, as have PwC and the Wall Street Journal. However, big data holds many other risks. Chief among these, in my mind, is the threat to free market competition.
Today, we see companies building their IP not solely on technology, but rather on proprietary data and its derivatives. As ever-increasing amounts of data are collected by businesses, new opportunities arise to build new markets and products based on this data. This is all to the good. But what happens next? Data becomes the barrier-to-entry to the market and thus prevents new competitors from entering. As a result of the established player’s access to vast amounts of proprietary data, overall industry competitiveness suffers. This hurts the economy. Federal government regulators must ask themselves: Should data that only one company owns, to the extent that it prevents others from entering the market, be considered a form of monopoly?
The search market is a perfect example of data as an unfair barrier-to-entry. Google revolutionized the search market in 1996 when it introduced a search-engine algorithm based on the concept of website importance — the famous PageRank algorithm. But search algorithms have significantly evolved since then, and today, most of the modern search engines are based on machine learning algorithms combining thousands of factors — only one of which is the PageRank of a website. Today, the most prominent factors are historical search query logs and their corresponding search result clicks. Studiesshow that the historical search improves search results up to 31%. In effect, today’s search engines cannot reach high-quality results without this historical user behavior. This creates a reality in which new players, even those with better algorithms, cannot enter the market and compete with the established players, with their deep records of previous user behavior. The new entrants are almost certainly doomed to fail. This is the exact challenge Microsoft faced when it decided to enter the search market years after Google – how could it build a search technology with no past user behavior? The solution came one year later when they formed an alliance with Yahoo search, gaining access to their years of user search behavior data. But Bing still lags far behind Google. This dynamic isn’t limited only to internet search.
Given the importance of data to every industry, data-based barriers to entry can affect anything from agriculture, where equipment data is mined to help farms improve yields, to academia, where school performance and census data is mined to improve education. Even in medicine, hospitals specializing in certain diseases become the sole owners of the medical data that could be mined for a potential cure.
While data monopolies hurt both small start-ups and large, established companies, it’s the biggest corporate players who have the biggest data advantage. McKinsey calculates that in 15 out of 17 sectors in the U.S. economy, companies with more than 1,000 employees store, on average, over 235 terabytes of data—more data than is contained in the entire US Library of Congress. Data is a strategy – and we need to start thinking about it as one. It should adhere to the same competitive standards as other business strategies. Data monopolists’ ability to block competitors from entering the market is not markedly different from that of the oil monopolist Standard Oil or the railroad monopolist Northern Securities Company.Perhaps the time has come for a Sherman Antitrust Act – but for data.
Unsure where you come down on this issue? Consider this: studies have shown that around 70% of organizations still aren’t doing much with big data. If that’s your company, you’ve probably already lost to the data monopolists.