Big Data and the Hunt for New Drugs

Stephen DeAngelis

September 8, 2017

Controlling healthcare costs remains a hot topic, especially in the United States. Although no single strategy for controlling costs exists, many people point to the high cost of drugs as one area where potential savings can be made. Pharmaceutical firms often justify the prices they charge by noting how expensive it is to research, test, and bring a new drug to market. There is little debate about the fact that the cost of bringing a new drug to market is enormous. Exacerbating the challenge is the fact that the potential market for new drugs is becoming more narrow and specialized. Streamlining processes, prioritizing investments, and improving outcomes are critical not only to the industry but also to individual patients. Big data offers hope. Ryan Copping, global head of data science analytics at Genentech, and Dr. Michael Li (@tianhuil), founder and executive director of The Data Incubator, write, “The emergence of big data, as well as advancements in data science approaches and technology, is providing pharmaceutical companies with an opportunity to gain novel insights that can enhance and accelerate drug development. It will increasingly help government health agencies, payers, and providers to make decisions about such issues as drug discovery, patient access, and marketing.”[1] Andrii Buvailo (@ABuvailo), Head of e-commerce at Enamine Ltd, adds, “Players in the biopharmaceutical industry are looking toward AI to speed up drug discovery, cut R&D costs, decrease failure rates in drug trials and eventually create better medicines.”[2]

Big Data, Cognitive Computing, and Drug Discovery

Although leveraging big data and cognitive computing for drug discovery remains in its infancy, Debasmita Banerjee reports, “A research team from the Computational Biology department of Carnegie Mellon University has developed a machine-learning based experimental set up for analyzing the effects of drugs on protein patterns. The new fully automated robotic system is a first of its kind and reduces human effort by a staggering seventy percent. This novel approach towards drug-protein interaction will also be a boon for the medical industry by significantly lowering the cost of discovery of drugs.”[3] In order to help you understand how complicated efforts like that can be, just look at the protein side of drug-protein interactions. The staff from Genetic Engineering & Biotechnology News report, “Without computer modeling, it would be extraordinarily difficult to predict protein structures simply through the analysis of genome sequence data. In fact, the analysis of such data might be as helpful as reading tea leaves. … There are close to 15,000 protein families in the database Pfam. For nearly a third (4752) of these protein families, there is at least one protein in each family that already has an experimentally determined structure. For another third (4886) of the protein families, comparative models could be built with some degree of confidence. For the final third (5211), however, no structural information exists.”[4] In the Carnegie Mellon experiment, “The algorithm studied the possible interactions between 96 drugs and 96 cultured mammalian cell clones with separate fluorescently tagged proteins and involved a total of 9,216 experiments. The primary goal of this process was to learn how drugs reacted with proteins without having to actually test all of them. It was expected that these results would aid the system in identifying potentially new phenotypes on its own.” Writing about the Carnegie Mellon experiments, Chris Wood observes, “When working on a new drug, scientists have to determine its effects to ensure that it’s both an effective treatment and not harmful to patients. This is hugely time-consuming, and it’s simply not practical to perform experiments for every possible set of biological conditions.”[5] He continues:

“A total of 30 rounds of testing were undertaken by the automated system, with 2,697 experiments completed out of the possible 9,216. The rest of the outcomes were predicted by the machine, to an impressive accuracy rate of 92 percent. The researchers believe that their work proves that machine learning techniques are viable for use in medical testing, and could have a big impact on both the practical and financial issues faced by the field.”

Machine learning systems require data — and lots of it — on which to train. Fortunately, humankind is generating data at a remarkable rate. Of course, not all of it is useful for specific purposes like drug discovery. Fortunately, Buvailo reports a lot of drug-related data does exist. “AI has exciting opportunities to prosper in the biopharmaceutical field,” he writes. “The advances in combinatorial chemistry in the 1990s generated many millions of novel chemical compounds for testing as possible drugs. This stimulated the development of different high-throughput screening (HTS) techniques to perform such testing in relatively short terms, generating numerous public and private databases of compound bioactivities and toxicities. Simultaneously, a rapid progress in biology unfolded in the 1990s with advances in gene sequencing and ‘multi-omics’ studies leading to the accumulation of billions of data points describing genes, proteins, metabolites and mapping interconnections between different biochemical processes and their phenotype manifestations.” The next step is putting all of that data to work.

When enough of the right data is available, amazing things can happen. Ben Hirschler (@reutersBenHir), reports, “Artificial intelligence robots are turbo-charging the race to find new drugs for the crippling nerve disorder ALS, or motor neurone disease.”[6] Discovering drugs to treat specific illnesses, like ALS, can be prohibitively costly since it’s often difficult for manufacturers to recoup costs because there is not enough demand for the drug. It’s a vicious circle in which too many patients get trapped. Hirschler reports, “There are only two drugs approved by the U.S. Food and Drug Administration to slow the progression of ALS (amyotrophic lateral sclerosis), one available since 1995 and the other approved just this year. About 140,000 new cases are diagnosed a year globally and there is no cure for the disease, famously suffered by cosmologist Stephen Hawking.” Hirschler reports that Richard Mead of the Sheffield Institute of Translational Neuroscience, is using artificial intelligence to speed up his work on finding drugs to treat ALS. He explains:

“[AI] robots — complex software run through powerful computers — work as tireless and unbiased super-researchers. They analyze huge chemical, biological and medical databases, alongside reams of scientific papers, far quicker than humanly possible, throwing up new biological targets and potential drugs. One candidate proposed by AI machines recently produced promising results in preventing the death of motor neurone cells and delaying disease onset in preclinical tests in Sheffield. … In Arizona, the Barrow Neurological Institute last December found five new genes linked to ALS by using IBM’s Watson supercomputer. Without the machine, researchers estimate the discovery would have taken years rather than only a few months. Mead believes ALS is ripe for AI and machine-learning because of the rapid expansion in genetic information about the condition and the fact there are good test-tube and animal models to evaluate drug candidates.”

Although big data and cognitive computing have the potential to speed up drug discovery and reduce costs, Buvailo reminds us we remain in the early stages of this endeavor. “As of today,” he writes, “there are no AI-inspired, FDA-approved drugs on the market. Also, it is important to realize that while AI-based data analytics can bring innovation at every stage of drug discovery and during the development process, this data will not magically serve as a substitute for chemical synthesis, laboratory experiments, trials, regulatory approvals and production stages. What AI can do, though, is optimize and speed up R&D efforts, minimize the time and cost of early drug discovery, and help anticipate possible toxicity risks or side effects at late-stage trials to hopefully avoid tragic incidents in human trials. It can help incorporate knowledge derived from genomics and other biology disciplines into drug discovery considerations to come up with revolutionary ideas for drugs and therapies.” As technology advances and more data is accumulated, I predict significant breakthroughs in the pharmaceutical field will be made and we’ll all be the better for it.

Footnotes
[1] Ryan Copping and Michael Li, “The Promise and Challenge of Big Data for Pharma,” Harvard Business Review, 29 November 2016.
[2] Andrii Buvailo, “Artificial Intelligence In Drug Discovery: A Bubble Or A Revolutionary Transformation?” Forbes, 3 August 2017.
[3] Debasmita Banerjee, “Machine Learning System Could Remarkably Ease The Process Of Drug Discovery,” Crazy Engineers, 11 February 2016.
[4] Staff, “Big Data Predicts Structures for Hundreds of Protein Families,” Genetic Engineering & Biotechnology News, 23 January 2017.
[5] Chris Wood, “Machine-learning robot could streamline drug development,” New Atlas, 10 February 2016.
[6] Ben Hirschler, “How AI robots hunt new drugs for crippling nerve disease,” Reuters, 10 August 2017.

Does AI Pose an Existential Threat?

The late theoretical physicist Stephen Hawking told the BBC, “The development of full artificial intelligence could spell the end of the human race.”[1] Hawking used

Will 2024 be the Year of Automation and AI in the Supply Chain?

Late last year, the staff at Digitate released the results of a survey entitled “AI and Automation: Laying the Foundation for the Autonomous Enterprise.” The