Data Lakes and Other Big Data Analytics Trends You Need to Know About

Stephen DeAngelis

October 27, 2014

You’ve undoubtedly heard of cloud computing; but, do you know anything about data lakes? Robert L. Mitchell (@rmitch) warns, “Big data technologies and practices are moving quickly” and there are at least eight trends you need to know “to stay ahead of the game.” [“8 Big Trends in the Big Data Analytics,” Computerworld, October 2014] Bill Loconzolo, Vice President of Data Engineering at Intuit, told Mitchell, that big data analytical tools are still emerging and that they are not yet at the level they need to be; nevertheless, “the disciplines of big data and analytics are evolving so quickly that businesses need to wade in or risk being left behind.” Loconzolo stated, “In the past, emerging technologies might have taken years to mature. Now people iterate and drive solutions in a matter of months — or weeks.” As Mitchell’s headline states, he identifies eight significant trends in the areas of big data and analytics that he believes business executives need to know about. They are: big data analytics in the cloud; Hadoop; big data lakes; predictive analytics; SQL on Hadoop; NoSQL; deep learning; and in-memory analytics. It’s a good list. Let’s briefly discuss each of these trends.

 

Big Data Analytics in the Cloud

 

Mitchell points out that Hadoop was created as “a framework and set of tools for processing very large data sets … on clusters of physical machines.” With the rapid emergence of cloud computing, a significant percentage of big data analytics has moved to the cloud. Brian Hopkins (@practicingEA), an analyst at Forrester Research, told Mitchell, The future of big data will be a hybrid of on-premises and cloud.” Fears of data breaches have caused some companies to move cautiously about in the area of cloud computing. Last year, however, Luth Research and Vanson Bourne surveyed companies that are currently using cloud services to see how such services are working out. “Results were revealing — the benefits of using cloud computing are by all accounts living up to the hype, and many of the predicted problems are not actually being reported.” [“TechInsights Report: Cloud Succeeds. Now What?” CA Technologies, 10 December 2013] Other findings included:

 

  • Nearly 100% overall satisfaction with cloud results (innovation, cost, revenue, performance)
  • The longer a company used cloud, the more likely that cloud exceeded expectations (71% who have been in the cloud for 4+ years said it exceeded expectations)
  • Around 50% of respondents have already moved mission-critical apps to the cloud
  • 98% of respondents reported that the cloud met or exceeded their expectations for security

 

Hadoop: The New Enterprise Data Operating System

 

Hopkins told Mitchell that “distributed analytic frameworks, such as MapReduce, are evolving into distributed resource managers that are gradually turning Hadoop into a general-purpose data operating system.” As a result, Hopkins says, “You can perform many different data manipulations and analytics by plugging them into Hadoop as the distributed file storage system. …The ability to run many different kinds of [queries and data operations] against data in Hadoop will make it a low-cost, general-purpose place to put data that you want to be able to analyze.” Timo Elliott (@timoelliott) adds, “Algorithms such as MapReduce and projects such as Hadoop have introduced new opportunities for storing and analyzing data that was previously ignored because of technology limitations.” [“The Top 10 Trends In Analytics 2013,” Business Analytics, 22 April 2013]

 

Big Data Lakes

 

Mitchell writes, “Traditional database theory dictates that you design the data set before entering any data. A data lake, also called an enterprise data lake or enterprise data hub, turns that model on its head.” For years companies have looked for ways to break down information silos in order to create a single version of the truth from which insights can be drawn and corporate strategies aligned. Data lakes provide part of the answer. The use of a data lake helps democratize access to data. Ramin Dutt (@rimin) explains why democratizing access to data is important. “Business analytics have largely been focused on tools, technologies and approaches for accessing, managing, storing, modelling and optimizing for analysis of structured data,” she writes. “This is changing as organizations strive to gain insights from new and diverse data sources. The potential value of harnessing and acting upon insights from these new and previously untapped sources of data, coupled with the significant market hype around big data, has fueled new product development to deal with a data variety across existing information management stack vendors and has spurred the entry of a flood of new approaches for relating, correlating, managing, storing and finding insights in varied data.” [“Business intelligence and analytics need to scale up: Gartner,” IT World, 24 January 2013]

 

More Predictive Analytics

 

Predictive analytics is one area where cognitive computing takes big data analytics to the next level. Justin Lyon, CEO of Simudyne, explains, “Big data platforms use pattern recognition to turn data into information about what happened, where and why. A cognition platform is more predictive, turning that historical information into knowledge of what might happen in the future. It gives you the ability to test your decision in the knowledge that if it doesn’t work it doesn’t matter because you just hit ‘reset’.” [“Artificial Intelligence for Risk Managers,” Lloyd’s, 4 April 2014] Mitchell adds that predictive analytics, using a combination of big data and computing power, “lets analysts explore new behavioral data throughout the day, such as websites visited for location.” Cognitive computing is discussed further in the section on deep learning below.

 

SQL on Hadoop: Faster, Better

 

Structured Query Language (SQL) is a programming language designed for managing data contained in a relational database management system. If you want to conduct analysis on data, you need a way to query it. Mitchell writes, “That’s where SQL for Hadoop products come in. … Tools that support SQL-like querying let business users who already understand SQL apply similar techniques to that data.” Hopkins told Mitchell, “SQL on Hadoop isn’t going to replace data warehouses, at least not anytime soon, … ‘but it does offer alternatives to more costly software and appliances for certain types of analytics.'”

 

More, Better NoSQL

 

SQL is good for querying structured relational databases, but not all data sets are structured. That’s where NoSQL (i.e., Not Only SQL) queries come into play. NoSQL provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Chris Curran (@cbcurran), Principal and Chief Technologist at PwC U.S. Advisory Practice, told Mitchell that NoSQL “databases, are rapidly gaining popularity as tools for use in specific kinds of analytic applications, and that momentum will continue to grow.”

 

Deep Learning

 

Hopkins told Mitchell that deep learning “is still evolving but shows great potential for solving business problems.” Mitchell defines deep learning as “a set of machine-learning techniques based on neural networking.” It should be pointed out, however, that not all deep learning (or cognitive computing) systems require neural networking. “Deep learning,” Hopkins states, “enables computers to recognize items of interest in large quantities of unstructured and binary data, and to deduce relationships without needing specific models or programming instructions.” Accenture’s latest technology vision entitled “From Digitally Disrupted to Digital Disrupter,” predicts that cognitive computing will the “ultimate solution” for big data analytics. It explains:

“What if … machines could be taught to leverage data, learn from it, and, with a little guidance, figure out what to do with it? That’s the power of machine learning — which is a major building block of the ultimate long-term solution: cognitive computing. Rather than being programmed for specific tasks, machine learning systems gain knowledge from data as ‘experience’ and then generalize what they’ve learned in upcoming situations. Cognitive computing technology builds on that by incorporating components of artificial intelligence to convey insights in seamless, natural ways to help humans or machines accomplish what they could not on their own.”

As President and CEO of a cognitive computing company, Enterra Solutions®, I was pleased that the Accenture study discusses a case study in which Enterra plays a significant role (see my article entitled “Cognitive Computing and the Digital Business“).

 

In-memory Analytics

 

The final trend discussed by Mitchell involves the use of in-memory databases. He writes, “The use of in-memory databases to speed up analytic processing is increasingly popular and highly beneficial in the right setting. … In fact, many businesses are already leveraging hybrid transaction/analytical processing (HTAP) — allowing transactions and analytical processing to reside in the same in-memory database.” Elliott adds:

“In-memory computing is providing an opportunity to rethink information systems from scratch. According to Gartner, in-memory: ‘isn’t only about SAP HANA, isn’t new, isn’t unproven, isn’t only about big companies, and isn’t only about analytics’: ‘In-memory computing will have a long-term disruptive impact by radically changing users’ expectations, application design principles, and vendor’s strategy.'”

Mitchell concludes, “With so many emerging trends around big data and analytics, IT organizations need to create conditions that will allow analysts and data scientists to experiment.” Curran added, “You need a way to evaluate, prototype and eventually integrate some of these technologies into the business.” I agree that the best way to evaluate big data projects is through the use of pilot projects. By working together, vendors and customers can create an analytic solution that is tailored and scalable to the exact requirements of the organization.