Cognitive Computing Can Help with Data Scientist Shortage

Stephen DeAngelis

December 14, 2015

“There’s a problem within big data,” writes Rick Delgado (@ricknotdelgado). “The problem is that there’s too much information and not enough talent to manage it. The supply of analysts and data scientists can’t keep up with the ever growing demand for this type of talent. This shortage presents a problem, because even the most advanced data platforms are useless without experienced professionals to operate and manage them.”[1] Carlton Doty, an analyst at Forrester Research, along with his colleagues Brian Hopkins, Jennifer Belissent, predict, “Firms will try to come to terms with data science scarcity. Two-thirds of firms will have built predictive systems capability by mid-2016, but will struggle to find data science talent.”[2] Michael Fitzgerald states the problem bluntly, “It’s gospel that companies everywhere want to hire data scientists, and can’t find them.”[3] You get the idea — there’s a data scientist shortage. The question is: What can be done about it? Delgado asked that very question and, using a little sideways thinking, asked, “What if there was another solution. What if instead we trained computers to do the work for us, or at least make it easier to manage data tools? Improvements in cognitive computing are making that an approaching reality.” He’s right. Fitzgerald calls this approach “data scientist in a can.”

 

Fitzgerald insists the data-scientist-in-a-can approach is necessary because the skills gap is too large to close through education and training. He points to predictions from McKinsey & Company and Gartner that put the skills shortage gap anywhere from the hundreds of thousands of jobs to millions of jobs over the next few years. He goes on to note that technology has always stepped in when skills gaps are prevalent. A prime example of that involves the telephone system. When it was first developed, the telephone system relied on human operators who manually worked switchboards to route calls. Pundits predicted that every available woman in America would be required to man the switchboards as the system grew. That didn’t happen, of course, because automatic switches were developed. Delgado and Fitzgerald are among a growing number of analysts who predict that cognitive computing systems will help fill in the gap between job openings and data scientists. There are, of course, skeptics. Sridhar Ramaswamy, who commented on Fitzgerald’s article wrote:

“Data Analytics involves several steps in my opinion. It requires understanding the business problem, gathering relevant data, data cleansing, solving and deriving business inference out of the solution. I am skeptical any data analytics software could do these steps as ‘Data Scientist in a Can’ without human intervention. It requires an SME to put forth a business problem, data scientist to translate into a relevant analytical problem, identify relevant data and processes, besides intelligently coming up with relevant algorithms.”

To some extent, Ramaswamy is correct. Cognitive computers will not replace the need for all data scientists; but, they will help fill the gap. Ram Akella, a professor of information systems and technology at the University of California, Berkeley and UC Santa Cruz, told Fitzgerald, “There are some kinds of problems that carry over across industries and lend themselves to automation — either through software or through a consulting model, where data scientists develop tools to solve a specific set of problems — other kinds of analytics are too complicated to be automated. … Even if human participation gets reduced to a sliver, it needs to be there, or the algorithm won’t learn.” Fitzgerald put it this way, “Somebody will always have to open the can.” Companies should view cognitive computing systems as way of closing the data analysis skills gap not replacing data scientists. The Enterra Enterprise Cognitive System™ is a good example of the kind of technology involved. When we talk to clients who have an analytic problem, they see the approach to addressing it in much the same way as Ramaswamy. They typically have to assemble a team of three experts:

 

  • A business domain expert — the customer of the analysis who can help explain the drivers behind data anomalies and outliers.
  • A statistical expert — to help formulate the correct statistical studies, the business expert knows what they want to study, and what terms to use to help formulate the data in a way that will detect the desired phenomena.
  • A data expert — the data expert understands where and how to pull the data from across multiple databases or data feeds.

 

Having three experts involved dramatically lengthens the time required to analyze, tune, re-analyze, and interpret the results. Enterra’s approach empowers the business expert by automating the statistical expert’s and data expert’s knowledge and functions, so the ideation cycle can be dramatically shortened and more insights can be auto-generated. Even some of the business expert’s logic is automated to help tune and re-analyze the data. In article written in 2014, Mark Gibbs (@quistuipater) described the Enterra approach.[4] He wrote:

“Enterra Solutions, a key competitor in the big data analytics market, has a solution that is completely different in that it can automatically mine data exhaustively and intelligently to draw conclusions based on natural language queries. Enterra Solutions can ingest huge amounts of data and using natural language processing transform it into knowledge using a generalized ontology to discover the meanings of words in context along with the implicit rules and relationships as used by humans. Then, when a question is asked in what is more or less natural language, the database of knowledge is accessed by Enterra’s Hypothesis Engine. The Hypothesis Engine is an artificial intelligence system that applies common sense and domain-specific ontologies to further structure the knowledge. Next, using Enterra’s Rules-Based Inference System it can determine an objective and find the facts to support that objective (backward chaining) as well as using facts to determine objectives (forward chaining) as determined by the knowledge found and its significance. Other engines in the system weigh results, formulate database queries, and analyze assets and all of these components pass data back and forth between themselves based on rules and inferences to derive conclusions.”

Gibbs concluded, “The application of artificial intelligence to Big Data analytics is one the hottest areas in data science and its ability to make up for the shortfall of human data scientists (which is likely to be a long term problem) means Enterra Solutions has a very rosy future.” Although I appreciate Gibbs’ prediction about Enterra’s future, what Gibbs was really predicting was a rosy future for cognitive computing. He’s not alone. Delgado concludes, “Without a doubt, advancements in cognitive computing, and infusing AI and machine learning into big data platforms and tools, will not only enable less experienced staff to handle the complexities of data analytics, but also improve the quality of results. With so much information to handle, transferring much of the work to our machines will enable us to react quicker and turn real-time analytics into real-time decisions.” If your company already has a team of talented data scientists, hold on to them — they’re corporate treasures. But, even they can benefit from the capabilities of offered by cognitive computers.

 

Footnotes
[1] Rick Delgado, “Cognitive Computing: Solving the Big Data Problem?KDnuggets, June 2015.
[2] Carlton Doty, “2016 Predictions: All That Data Will Finally Drive Business Action,” Information Management, 17 November 2015.
[3] Michael Fitzgerald, “Data Scientist In a Can?MIT Sloan Management Review, 18 December 2014.
[4] Mark Gibbs, “Not enough data scientists? Use AI instead.Network World, 7 March 2014.