Big Data and Trend Watching

Stephen DeAngelis

July 3, 2012

“The business of Big Data, which involves collecting large amounts of data and then searching it for patterns and new revelations,” writes Quentin Hardy, “is the result of cheap storage, abundant sensors and new software. It has become a multibillion-dollar industry in less than a decade. Growing at speed like that, it is easy to miss how much remains to do before the industry has proven standards. Until then, lots of customers are probably wasting much of their money.” [“How Big Data Gets Real,” New York Times, 4 June 2012] With that much money flowing into a new industry, you can rest assured that there are a lot of companies trying get into income stream. As a result, Todd L. Michaud warns, “The Big Data space is filled with so many posers, fakers and wannabes it’s ridiculous. Everybody is trying to catch the Big Data wave by getting their name attached to this hot new trend.” [“Big Data Is Exactly What You Think It Isn’t,” Storefront Backtalk, 16 May 2012] Hardy and Michaud aren’t arguing that companies shouldn’t make the jump onto the Big Data bandwagon; they are arguing that companies need to look before they leap. Hardy writes:

“There is essential work to be done training a core of people in very hard problems, like advanced statistics and software that ensures data quality and operational efficiency. Broad-based literacy in the uses of data should probably happen too, along with new kinds of management, better tools for reading the information, and privacy safeguards for corporate and personal information. That such a huge number of tasks are taking place is a good indicator that, even with the hype, Big Data is a big deal.”

Big Data is a big deal because it can be used to address so many different kinds of challenges. As Hardy puts it, “In some ways, Big Data is about managing all kinds of weird new data, like social media updates from a mobile phone.” He continues:

“It is hard to categorize in the first place, and may be used in lots of different ways, from advertising to traffic management. The so-called unstructured database of choice is by now pretty clearly Hadoop. … Data quality from new diverse sources is still a big problem, as is persuading companies and organizations to let others see data that might be more valuable in a commonly shared algorithm. ‘I’ve tried paying money for it, but it’s easier for companies to decide not to share,’ said Gil Elbaz, the founder of Factual, a company that seeks to hold lots of online data. ‘The only way that works is to get them to take risks in exchange for data that is valuable to them.’ Much of the fear about exposing data, he said, has to do with competitors learning secrets. Mr. Elbaz thinks there is a good business in developing ‘de-identifiers’ that can make data anonymous, and privacy insurers specializing in covering the costs of exposure.”

Sharing sensitive information securely, even between consenting partners, is challenging and, as Hardy points out, is likely to remain so. He concludes:

“What Big Data is seeing now looks like the classic industrial curve. There is the first discovery of something big, leading to establishing principles like scientific rules. Science moves toward engineering as a means to manufacturing, resulting in mass deployment. Then things really change.”

Michaud writes that if you are one of the few business people who doesn’t care about Big Data, “You should.” He explains:

“All of a sudden, Web logs that were kept simply for troubleshooting purposes can now be mined to determine valuable information about customers’ preferences. Logs that are created by physical machines can now be analyzed en masse to look for information to help advance a business. Data from social networks can now be mined for customer sentiment. These problems were too big and too complex before. But now, answers are within reach.”

Michaud reports that the reason answers are now within reach is because “a bunch of really smart people have solved the two biggest challenges when it comes to analyzing data of large size and complexity.” Those solutions are:

“First, they have removed the need for giant, expensive, specialized hardware platforms (instead using a large number of small ‘commodity servers’ or even cloud servers). And second, they have also removed the need to structure the data in a given format prior to running analysis. These two technological advancements (and the dozens of other underlying technologies that support them) have unleashed a tremendous number of possibilities when it comes to gaining insight from information that would have previously required millions of dollars worth of hardware and a staff of data experts to process.”

What I’d like to turn to now is the topic of how financial services companies are gathering unstructured data and trying to discern coming trends. Paul Hawtin, chief executive of Derwent Capital Markets, told Ariana Eunjung Cha, that “analyzing mathematical trends on the Web delivers insights and news faster than traditional investment approaches.” [“‘Big data’ from social media, elsewhere online redefines trend-watching,” Washington Post, 6 June 2012] Cha reports that a study released by the World Economic Forum stated that “business boundaries are being redrawn” as a result of Big Data. She writes that the report also concluded that “companies with the ability to mine the data are becoming the most powerful.” She continues:

“While the human brain cannot comprehend that much information at once, advances in computer power and analytics have made it possible for machines to tease out patterns in topics of conversation, calling habits, purchasing trends, use of language, popularity of sports, spread of disease and other expressions of daily life. ‘This is changing the world in a big way. It enables us to watch changes in society in real time and make decisions in a way we haven’t been able to ever before,’ said Gary King, a social science professor at Harvard University and a co-founder of Crimson Hexagon, a data analysis firm based in Boston.”

Although Cha’s article focuses on how financial services firms use Big Data to identify emerging trends, it doesn’t require a clairvoyant to see how public health agencies could use the same techniques to identify emerging health concerns. Cha notes that politicians are using these techniques to determine voter sentiment in key states. The possibilities seem endless. She continues:

“Many questions about big data remain unanswered. Concerns are being raised about personal privacy and how consumers can ensure that their information is being used fairly. Some worry that savvy technologists could use Twitter or Google to create false trends and manipulate markets. Even so, sociologists, software engineers, economists, policy analysts and others in nearly every field are jumping into the fray. And nowhere has big data been as transformative as it has been in finance. Wall Street is all about information advantage. Every little bit could mean the difference between a bonanza or a devastating loss, and so big data is being fed into computers to power high-frequency trading algorithms — and directly to traders in every way imaginable.”

Cha reports that “hedge funds are experimenting with scanning comments on Amazon product pages to try to predict sales. Banks are tallying job listings on Monster as an indicator of hiring. Investment firms are conducting computer analyses of the financial statements of public companies to search for signs of a bankruptcy.” Analysts told her that it is no longer necessary to wait for government-released data to spot emerging trends. In fact, government analyzed data is now considered time-late when compared to data that can be gleaned “by analyzing publicly available data online.” Cha continues:

“Five years ago, only 2 percent of investment firms were incorporating Twitter analysis and other forms of ‘unstructured’ data into their trading decisions, according to a report by Adam Honore, a research director at Aite, a financial services consulting group based in Boston. By 2010, the share of companies experimenting with this technology jumped to 35 percent. Today, Honore said, that number is closer to 50 percent. ‘Big data is fundamentally changing how we trade,’ Honore said.”

Richard Tibbetts, chief technology officer at StreamBase, told Cha that analysts are refining how to examine “data in motion.” Cha explains:

“The trick is to be able to find the digital smoke signals amid all the other stuff. Traders who were analyzing Twitter for unusual activity, for instance, were able to get the news of Osama bin Laden’s death and a massacre in Norway hours before the information was officially confirmed, giving them a significant jump on their colleagues who learned of the events through traditional news sources. ‘The new generation of trader expects to have dozens of tools at their fingertips instead of just a Bloomberg terminal,’ Tibbetts said.”

Because non-traditional sources of data, like Twitter, can be manipulated, Hawtin told Cha that he warns his clients “that there is a high level of risk” making investment decisions based on such data. Even so, Cha reports that “interest in his project was so great that in April he began offering his technology to retail investors.” She continues:

“In addition to its efforts to gauge the collective mood of the world, the company now examines messages on Twitter, Facebook and other social-media outlets to create measures for individual stocks and commodities. … The numbers support Hawtin’s strategy — at least so far. His investors beat the main London stock index by seven-fold in the first quarter of this year. But programs such as Hawtin’s are only as good as the data being entered, and a growing backlash against big data may threaten the flow of that information.”

In fact, the elephant in the room in every discussion of Big Data is privacy and how rising concerns could limit access to critical data. Cha reports:

“Companies and governments are pushing the envelope in the use and reuse of data in ways not originally intended, and privacy groups are pushing back. Even the basic definition of personal data varies widely from one country to another, making it unclear how it can be used. The regulatory framework has not caught up with the technology. Tim Berners-Lee, a founder of the World Wide Web, has become so concerned about the misuse of personal information by companies and governments that he has warned people to be cautious about what they put online. The data sets are so large that they are normally analyzed in aggregate, but privacy advocates worry that information can still be tied to individuals. Civil liberties groups have sued to stop a U.S. government program that monitors social media data for national security threats, arguing that it could be used to unjustly label people as bad credit risks — or even terrorists — and chill free speech.”

Analysts worry that if data is filtered or restricted results obtained from that data may be skewed or misleading. Another concern is that some “parties have an unfair advantage because they have better information than others — a phenomenon that some have argued shakes the foundation of a market economy.” That’s why the World Economic Forum has declared data a new class of asset. Companies have always attempted to gain an edge over their competitors and mining Big Data is going to another battle ground on which they struggle.