Data Scientists and Machine Learning

Stephen DeAngelis

August 19, 2019

Most articles about artificial intelligence (AI) focus on the specific area of machine learning. Machine learning is one the most obvious ways to make sense of the oceans of data being created in today’s business world. John Rakowski (@momskij), Director of Product Marketing for Application Performance Management and analytics at AppDynamics, asserts, “A lack of understanding about machine learning is holding enterprises back from adopting this emerging technology.”[1] He adds, “Machine learning has the potential to transform the way organizations interact with the world, to move faster and to provide better customer experience. But while machine learning’s long-term potential certainly looks bright, its adoption in the enterprise may advance more slowly than originally thought.” When understanding is lacking, a little caution is not a bad thing. Dev Kundaliya writes, “Algorithms don’t know how to say ‘the data is not clear’ or ‘I don’t know’.”[2] Kundaliya points to research by Rice University statistician Dr Genevera Allen, “who has found that the results produced by machine learning algorithms are often misleading or wrong.” Reading that, the Ray Parker Jr. lyrics from Ghostbusters might be floating in your brain, with a twist:


If there’s something strange in you neighborhood
Who you gonna call? (data scientists)
If there’s something weird
And it don’t look good
Who you gonna call? (data scientists)


To ensure insights derived by machine learning is accurate and valuable. You need the services of a data scientist. As Dr. Allen notes, “Researchers must keep questioning the reproducibility of the predictions or the findings made by machine learning techniques until new computational systems are developed, which are able to critique their own results.” According to Mark van Rijmenam (@VanRijmenam), founder of Datafloq, machine learning is important because it “deals with the information world.”[3] He explains, “Machines use data to learn, and machine learning aims to derive meaning from that data.”


Data and seven steps to machine learning


As van Rijmenam noted, “The objective of machine learning is to derive meaning from data.” He adds, “Therefore, data is the key to unlock machine learning. There are seven steps to machine learning, and each step revolves around data.” Those steps are:


Step 1. Data collection. Van Rijmenam writes, “Machine learning requires training data, a lot of it (either labelled, meaning supervised learning or not labelled, meaning unsupervised learning).”


Step 2. Data preparation. Van Rijmenam notes, “Raw data alone is not very useful. The data needs to be prepared, normalized, de-duplicated and errors and bias need to be removed. Visualization of the data can be used to look for patterns and outliers to see if the right data has been collected or if data is missing.”


Step 3. Choosing a model. According to van Rijmenam, “There are many models that can be used for many different purposes. Upon selecting the model, you need to make sure that the model meets the business goal.” At Enterra Solutions® we utilize Massive DynamicsRepresentational Learning Machine™ (RLM). The RLM can help determine what type of analysis is best-suited for the data involved in a high-dimensional environment.


Step 4. Training. “Training your model is the bulk of machine learning,” van Rijmenam explains. “The objective is to use your training data and incrementally improve the predictions of the model. Each cycle of updating the weights and biases is one training step. In supervised machine learning, the model is built using labelled sample data, while unsupervised machine learning tries to draw inferences from non-labelled data (without references to known or labelled outcomes).”


Step 5. Evaluation. Van Rijmenam explains, “After training the model comes evaluating the model. This entails testing the machine learning against an unused control dataset to see how it performs.”


Step 6. Parameter tuning. “After evaluating your model,” van Rijmenam writes, “you should test the originally set parameters to improve the AI. Increasing the number of training cycles can lead to more accurate results.”


Step 7. Prediction. “Once you have gone through the process of collecting data, preparing the data, selecting the model, training and evaluating the model and tuning the parameters,” writes van Rijmenam, “it is time to answer questions using predictions. These can be all kinds of predictions, ranging from image recognition to semantics to predictive analytics.”


One thing should have become apparent during the discussion of van Rijmenam’s seven steps, the need for a data scientist (or team of data scientists) throughout. Josh Thompson, an educational specialist, explains, “Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, manage and organize them. Then they apply all their analytic powers — industry knowledge, contextual understanding, skepticism of existing assumptions — to uncover hidden solutions to business challenges.”[4]


What is a data scientist?


Thompson writes, “Found at the cross section of business and information technology, a data scientist is a professional with the capabilities to gather large amounts of data to analyze and synthesize the information into actionable plans for companies and other organizations. Data scientists are analytical data experts who utilize their skills in both technology and social science to find trends and manage the data around them. With the growth of big data integration in business, they have evolved at the forefront of the data revolution.” He continues:

“On any given day, a data scientist is a mathematician, a statistician, a computer programmer and an analyst equipped with a diverse and wide-ranging skill set, balancing knowledge in different computer programming languages with advanced experience in data mining and visualization. Technical skills are not all that count, however. Data scientists often exist in business settings and are charged with making complex data-driven organizational decisions. As a result, it is highly important for them to be effective communicators, leaders and team members as well as high-level analytical thinkers. They are highly sought after in today’s data and tech heavy economy, and their salaries and job growth very clearly reflect that.”

If you’re interested in becoming a data scientist, I suggest reading Thompson’s entire article. You might also be interested in reading an article at My Caribbean Jobs that discusses “5 online data science courses for tech-curious professionals.”


Concluding thoughts


Rakowski insists, “Companies need to accept that they need to move faster as a digital business, and machine learning and automation is a prerequisite for success. Data is at the heart of machine learning, and those companies that culturally react to the importance of real-time insight that can be trusted and acted upon quickly, are those that will succeed and thrive.” Van Rijmenam adds, “[Machine learning] will augment many, if not all, business processes in the coming years. As such, machine learning will become an integral part of the automated organization of tomorrow. Thanks to increasingly faster hardware, we will see more powerful models offering better predictions.” If his vision is achieved, thank a data scientist.


[1] John Rakowski, “Why are enterprises slow to adopt machine learning?Techradar, 10 October 2018.
[2] Dev Kundaliya, “Machine-learning techniques often produce misleading or wrong results, researcher warns,” Computing, 18 February 2019.
[3] Mark van Rijmenam, “How to Prepare for an Automated Future: 7 Steps to Machine Learning,” Datafloq, 25
[4] Josh Thompson, “How to Become a Data Scientist in 2019,” Master’s in Data Science, 2019.