Big Data Drives Excitement Over Cloud Computing

Stephen DeAngelis

July 5, 2012

Reuven Cohen, a self-proclaimed “digital provocateur,” believes that the excitement surrounding the emergence of cloud computing is really more about access to Big Data than anything else. “Big Data,” he writes, “is the new cloud.” [“It’s Not The Size of Your Data, It’s How You Use It,” Forbes, 5 June 2012] He reports, “Big Data is all about the rapidly increasing information being created on a moment-by-moment basis.” But access to all that data also raises a number of questions in Cohen’s mind: “What does this phenomenon mean? Is the increasing volume of data simply evidence of an increasingly data driven world? Is privacy no longer even an option in a world where your every step can be analyzed in real-time? How does the ever-increasing volume of data change the landscape for today’s businesses?  More importantly, what is the economic impact?”

 

Cohen states that a report published “by The McKinsey Global Institute (MGI) titled ‘Big data: The next frontier for innovation, competition, and productivity‘ attempts to answer some of these questions. The report notes, ‘like other essential factors of production such as hard assets and human capital, it is increasingly the case that much of modern economic activity, innovation, and growth simply couldn’t take place without data.’” I mentioned the MGI report in a previous post (The Age of Big Data: Is It Coming or has It Arrived?) and noted that the MGI report implies that the era of Big Data is coming but has not yet arrived. I also cited a number of pundits who believe that age of Big Data is already here.

 

I’m not sure that Cohen is among those who believe that we have fully entered the age of Big Data. I say that because he doesn’t believe that the term “Big Data” is well enough defined to be meaningful. He writes:

“My next question is what exactly is Big Data? How big is big? A Gigabyte? A Terabyte? More? The reality is the term, like cloud computing, is loosely defined and interpreted by those who claim it as part of their marketing.”

Cohen admits that most analysts agree that Big Data has three dimensions: volume, velocity, and variety. He explains:

“In a 2001 research report, Gartner analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional: increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, has adopted this model for describing Big Data. Although I prefer a simpler and generally accepted definition: data sets so large and complex that they become awkward to work with using traditional database management tools.”

One thing that Cohen doesn’t question is the fact that Big Data is becoming big business. He cites a 2010 article from The Economist that estimates the Big Data industry is already generating over $100 billion a year and is growing fast. He then rhetorically asks, “What’s driving these massive numbers?” He states, “The answer can be found in emerging markets.”

“The Economist noted, ‘Between 1990 and 2005 more than 1 billion people worldwide entered the middle class. As they get richer they become more literate, which fuels information growth. The amount of digital information increases tenfold every five years.’ … [Computer technologies that were] once limited to the largest corporations are now available to anyone with a credit card. … It has gone global in massively expanding connected world. No longer are we constrained by a single chip, device or computer. What the PC revolution started, the Internet has super charged – information creation. The combined explosion of data, immense data creation and cheap, globally available, on-demand computation has enabled a new rapid transformation of raw data into usable information on scale we’re never seen before. More information is now being created in the time it takes me to write this article than was previously created in all of humankinds combined history. Unfortunately the majority of the information we’ve created has until recently, not been accessible. Most of this raw data or knowledge has been sitting in various silos — be it a library, a single desktop, a server, a database or even a data center. Big Data is changing this.”

The premise of Cohen’s article, however, is that the use (not the size) of data is what makes a difference. He reports that a number of companies are already putting that data to good use. He writes:

A survey by Avanade digs deeper, ‘An overwhelming majority of companies (73 percent) have already leveraged data to increase revenue. Of those companies that have already increased revenue, 57 percent used data to increase an existing revenue stream. Notably, the remaining 43 percent used data to create entirely new sources of revenue.’ … Avanade sees Big Data at a tipping point with technologies used to manage, analyze, report and make business decisions from large amounts of data quickly becoming easier to use and are more widely available to employees in companies large and small. It’s no longer the quantity or size of your data that limits you so much as what you do with it.”

Jason Waxman, the General Manager of Intel’s Data Center Group, told Cohen that “openness and standards” are required if the potential of Big Data is to be realized. Despite some implied skepticism, Cohen concludes:

“Big Data is changing everything. Those who will be the most successful will be the ones who are able to understand and adapt to the various real-time flows of information. Data has always been the oil powering the information age, but now this power is available to anyone, anywhere.”

Quentin Hardy agrees that Big Data is changing everything, but he argues that gaining access to “real data” is not as easy as Cohen implies. He writes, “Data quality from new diverse sources is still a big problem.” [“How Big Data Gets Real,” New York Times, 4 June 2012] As a result, a number of businesses have been established to help improve the quality of data being analyzed. Hardy continues:

“[In addition to businesses improving newly created data,] another data-improving business consists of moving the world’s older data online. A company called Captricity aims to couple image-capture from things like cellphone cameras with cheap workers in Amazon.com’s Mechanical Turk service, in order to put older handwritten documents into digital databases. The company’s early business is from government and charity sites in Africa and India, but there is no reason why it shouldn’t be valuable for most medical records. If someone took the trouble to write it down, the company figures, that is a good way to assume it is valuable data.”

One of the most important ways to make data “real” is to present it in a visually understandable way. Hardy reports that there are also businesses specializing in that field. He continues:

“There are other businesses trying to take the arcane side of Big Data into the mainstream, with easy-to-use statistical tools and new ways of visualizing data that make it easier to understand. Companies like ClearStory and Platfora ‘want to make it possible for businesses, for history majors, to use,’ said Ben Werther, chief executive of Platfora. ‘We’re in the pre-industrial age of Big Data.’ Martin Wattenberg, creator of a well-known wind map and who is now at Google, talked about a necessary revolution in design of data outcomes that have yet to become widespread.”

Hardy believes that Big Data is on the upslope of “the classic industrial curve.” He explains:

“There is the first discovery of something big, leading to establishing principles like scientific rules. Science moves toward engineering as a means to manufacturing, resulting in mass deployment. Then things really change.”

Although we are already seeing beneficial results from the analysis of Big Data, I find it exciting to contemplate the fact that pundits like Cohen and Hardy believe that even bigger things lie ahead. It should be noted, however, that if contemplating what lies ahead thrills some people, it raises apprehension in others. Dennis Overbye writes, “Big Data probably knows more about us than we ourselves do, but is there stuff that Big Data itself doesn’t know it knows? Big Data is watching us, but who or what is watching Big Data?” [“Mystery of Big Data’s Parallel Universe Brings Fear, and a Thrill,” New York Times, 4 June 2012] He continues:

“In the era of what is called Big Data, in which more and more information about our lives — where we shop and what we buy, indeed where we are right now — [data] tumbles faster and faster through bigger and bigger computers down to everybody’s fingertips, which are holding devices with more processing power than the Apollo mission control. It is perhaps time to be afraid. Very afraid, suggests the science historian George Dyson, author of a recent biography of John von Neumann, one of the inventors of the digital computer. In ‘A Universe of Self-Replicating Code,’ a conversation published on the Web site Edge, Mr. Dyson says that the world’s bank of digital information, growing at a rate of roughly five trillion bits a second, constitutes a parallel universe of numbers and codes and viruses with its own ‘physics’ and ‘biology.’ There are things going on inside that universe that we don’t know about, he points out — except when it produces unpleasant surprises, as it did during the ‘flash crash’ of the stock market in May 2010. And we had better find out what they are.”

For every “scary” anecdote that critics can raise about runaway algorithms, there are dozens of more benign success stories where algorithms analyzing Big Data have helped companies grow and communities flourish. Overbye continues:

“There is something both spooky and grand about the idea that our lives are part of patterns and currents still invisible to us, like climate cycles yet undetected in the geological record. … Surprises — what the complexity theorists call emergent properties — are part of the game. Do ants know they are in an anthill?”

To find answers to some of his nagging questions, Overbye reached out to J. Doyne Farmer, a physicist and complexity theorist at the Santa Fe Institute in New Mexico and a founder of the Prediction Company, which is now owned by UBS, the giant Swiss bank. He continues:

“Dr. Farmer said classical economics had failed miserably to provide the right data for us to understand ourselves. He and others have begun to develop so-called agent-based models of the economy, asking in effect how the seemingly random behavior of individual ants can give rise to anthills with all their pulsing purpose, form and intelligence. It works great for ants, and it’s pretty to think that we might have something to learn about ourselves from our little six-legged friends as they carry off the crumbs from another picnic. Even if it means there is nothing more profound than a 22nd-century beer-can opener in our future.”

To learn more about what ants can teach us, read my posts entitled “Swarm Behavior and Artificial Intelligence,” Part 1 and Part 2. The bottom line is that analyzing Big Data does can have profound effects on businesses, governments, organizations, and individuals. It is inherently neither good nor evil — just potentially game changing.