Machine Learning: The Hardest Part is Getting Started

Stephen DeAngelis

December 11, 2019

Data is now considered the world’s most valuable resource — displacing oil for that title. Unlike oil, whose supply is diminishing, the pool of data grows every second of every day. The Digital Age is all about data. As the pool of data became an ocean of data (aka Big Data), people needed a way to make sense of it all. It is no coincidence artificial intelligence (AI) platforms matured rapidly in response to the growth of data. AI platforms need large amounts of data in order to learn. Although there are several technologies covered by the umbrella term “artificial intelligence,” machine learning is probably the most discussed. There are myriad ways organizations can leverage machine learning. Louis Columbus (@LouisColumbus), a Principal at IQMS, reports Dresner Advisory Services’ “2019 Data Science and Machine Learning Market Study,” found, “Marketing and Sales prioritize AI and machine learning higher than any other department in enterprises today. In-memory analytics and in-database analytics are the most important to Finance, Marketing, and Sales when it comes to scaling their AI and machine learning modeling and development efforts. [And] R&D’s adoption of AI and machine learning is the fastest of all enterprise departments in 2019.”[1] In spite of the numerous ways machine learning can be used, some organizations don’t know where to start.


Machine learning implementation requires patience and collaboration


In a world often characterized by the desire for instant gratification, some business leaders lose patience or become frustrated with their efforts to leverage machine learning. Yaron Haviv (@yaronhaviv), Founder and CTO at iguazio, recommends stepping back and taking a deep breath. He notes, “It takes time and effort to move from a decent machine learning model to the next level of incorporating it into a live business application.”[2] Jessica Davis (@jessicadavis) adds, “Inside IT organizations, getting machine learning technologies from pilot into production is one of the hot topics of 2019. At this point, many organizations have run successful pilots. Yet many more have still haven’t achieved the value promised by machine learning because it isn’t integrated into organizational processes. … A Gartner study showed that only 47% of machine learning models are making it into production.”[3] Virtually every organization suffers growing pains when trying to implement machine learning projects. Macy Bayern (@macybayern) reports, “Nearly eight out of 10 organizations engaged in AI and machine learning said that projects have stalled, according to a Dimensional Research report. The majority (96%) of these organizations said they have run into problems with data quality, data labeling necessary to train AI, and building model confidence.”[4]


When I talk with clients about implementing cognitive computing solutions (which include machine learning), I recommend a “crawl, walk, run” approach. Such an approach allows solutions to be tweaked as they scale. It also requires a little patience and lots of collaboration. Davis reports this kind of collaborative approach is sometimes called MLOps. She explains, “The name MLOps, inspired by the DevOps practice and philosophy, is an approach to put machine learning into operations. DevOps itself has many definitions, depending on who you talk to, but overall it is about making software development work better for everyone by enabling collaboration among multiple groups, incorporating Agile-type processes, and making use of feedback for new iterations of the work. MLOps follows a similar path, but for implementing machine learning in organizations.” The practice has been gaining in enterprises, and many of its followers congregated at the recent MLOps NYC conference. David Aronchick, Microsoft’s head of open source machine learning strategy, told Davis, “Getting machine learning into production is the No. 1 problem. That’s why there are initiatives around MLOps. That’s exactly what it is designed for.”


Getting started with machine learning


Sherry Tiao (@SherryTiao), a Big Data expert at Oracle, asserts, “At your company, you can create the most elegant machine learning model anyone has ever seen. It just won’t matter if you never deploy and operationalize it.”[5] She offers seven tips for getting started:


1. Don’t Forget to Actually Get Started. As with most journeys, the first step is often the most important step. Tiao writes, “The truth is that at this point in machine learning, many people never get started at all. This happens for many reasons.” Like others she recommends starting slowly so you can learn as you go. She adds, “The learning you gain from this will be invaluable.”


2. Start with a Business Problem Statement and Establish the Right Success Metrics. An organization should never experiment with a technology unless its decision makers believe it will profit the company. Tiao explains, “Starting with a business problem is a common machine learning best practice. But it’s common precisely because it’s so essential and yet many people de-prioritize it.”


3. Don’t Move Your Data – Move the Algorithms. Machine learning requires data; however, data can be a problem. Tiao explains, “The Achilles heel in predictive modeling is that it’s a 2-step process. First you build the model, generally on sample data that can run in numbers ranging from the hundreds to the millions. And then, once the predictive model is built, data scientists have to apply it. However, much of that data resides in a database somewhere. … Growing your equations from inside the database has significant advantages. Running the equations through the kernel of the database takes a few seconds, versus the hours it would take to export your data. Then, the database can do all of your math too and build it inside the database. This means one world for the data scientist and the database administrator.”


4. Assemble the Right Data. Haviv notes, “Machine learning models are rarely trained over raw data. Data preparation is required to form feature vectors which aggregate and combine various data sources into more meaningful datasets and identify a clear pattern.” Tiao adds, “The right way [to assemble the right data] is to work backward from the solution, define the problem explicitly, and map out the data needed to populate the investigation and models. And then, it’s time for some collaboration with other teams.”


5. Create New Derived Variables. People often overlook the importance of variables. One of the benefits of a cognitive computing platform is that it can handle many more variables than previously possible. Tiao writes, “Creating new derived variables can help you gain much more insightful information.”


6. Consider the Issues and Test Before Launch. Proof-of-concept projects help you flesh out issues and test models before launching them. As Tiao notes, “Not only will you know how you’re doing it right, but you’ll also be able to feel more confident knowing that you’re doing it right. But going further than thorough testing, you should also have a plan in place for when things go wrong.”


7. Deploy and Automate Enterprise-Wide. A system is only effective if it is used. Tiao explains, “Once you deploy, it’s best to go beyond the data analyst or data scientist. … Always, always think about how you can distribute predictions and actionable insights throughout the enterprise. It’s where the data is and when it’s available that makes it valuable; not the fact that it exists.”


Tiao concludes, “The hardest part of this is launching your machine learning project.” Executive coach Joe Sabah once stated, “You don’t have to be good to start … you just have to start to be good!” That’s good advice when it comes to implementing machine learning solutions.


[1] Louis Columbus, “State Of AI And Machine Learning In 2019,” Forbes, 8 September 2019.
[2] Yaron Haviv, “Why is it So Hard to Integrate Machine Learning into Real Business Applications?Towards Data Science, 8 July 2019.
[3] Jessica Davis, “A Path from Pilot to Machine Learning Production,” InformationWeek, 2 October 2019.
[4] Macy Bayern, “96% of organizations run into problems with AI and machine learning projects,” TechRepublic, 24 May 2019.
[5] Sherry Tiao, “7 Machine Learning Best Practices for Business,” Oracle Big Data Blog, 5 September 2019.