Big Data and Better Health

Stephen DeAngelis

May 8, 2013

In late 2011, you might have read that IBM and healthcare provider WellPoint Inc., the nation’s largest insurer by membership, were teaming together to leverage the computing power of “IBM’s Watson supercomputer to diagnose medical illnesses and to recommend treatment options for patients.” [“IBM’s Watson supercomputer to give instant medical diagnoses,” by Duke Helfand, Los Angeles Times, 12 September 2011] That partnership has continued to develop over the succeeding months. Ian Steadman reports, “The first stages of a planned wider deployment, IBM’s business agreement with the Memorial Sloan-Kettering Cancer Center in New York and American private healthcare company Wellpoint will see Watson available for rent to any hospital or clinic that wants to get its opinion on matters relating to oncology. Not only that, but it’ll suggest the most affordable way of paying for it in America’s excessively-complex healthcare market. The hope is it will improve diagnoses while reducing their costs at the same time.” [“IBM’s Watson is better at diagnosing cancer than human doctors,” Wired, 11 February 2013]

Matthew J. Becker, Senior Director and Global Head of Statistical Programming at inVentive Health Clinical, told participants at a recent SAS Global Forum, that the primary goal of big data analysis “remains simple for those in health care and life sciences: cure disease and improve health outcomes.” [“What is the future of big data in health care?,” by Becky Graebe, SAS Voices, 6 May 2013] This is good news for patients and a great application of big data in the healthcare field beyond dealing with health records. To read more about how big data can help detect fraud using health records, read my post entitled Illuminating (and Eliminating) Fraud using Dark Data. One big challenge, according to Becker, is the fact that the “four big data pools” in the healthcare sector are owned by different groups and there has been very little sharing. That, he says, is a big problem. Graebe explains:

“Four distinct big data pools exist in the US health care domain, and Becker said there is very little overlap in ownership and integration of these pools, though that will be critical in making big strides with big data in health care:

Pharmaceutical R&D data

Clinical data

Activity (claims) and cost data

Patient behavior and sentiment data

“Becker went on to describe challenges and trends related to big data in health care, the evolving role of the data scientist and roadblocks to success. His parting words of advice cautioned conference attendees from taking on too much, too soon when dealing with big data.”

The fact is big data is likely to play an increasingly important role in the detection, diagnosis, and treatment of diseases. It comes as no surprise that the field of oncology is one of the first to benefit from big data analytics. The “Big C” probably raises more fear in patients than almost any other disease. Gina Kolata reports another exciting breakthrough in this field. She writes, “Scientists have discovered that the most dangerous cancer of the uterine lining closely resembles the worst ovarian and breast cancers, providing the most telling evidence yet that cancer will increasingly be seen as a disease defined primarily by its genetic fingerprint rather than just by the organ where it originated.” [“Cancers Share Gene Patterns, Studies Affirm,” New York Times, 1 May 2013] Dr. David P. Steensma, a leukemia researcher at the Dana-Farber Cancer Institute, welcomed the results of this research. He told Kolata, “This is exploring the landscape of cancer genomics. Many developments in medicine are about treatments or tests that are only useful for a certain period of time until something better comes by. But this is something that will be useful 200 years from now. This is a landmark that will stand the test of time.”

Like many fields, medical practitioners often specialize. They get so focused on their area of study that they have neither the time nor inclination to step back and see the bigger picture. That is where big data analytics, as Watson is showing, can play such an important role. Kolata calls this latest effort, which is sponsored by the National Institutes of Health, “a sprawling, ambitious project … to scrutinize DNA aberrations in common cancers.” Elaine Grant notes, “For some time, DNA sequencing has held big data’s starring role—after all, a single human genome consists of some 3 billion base pairs of DNA.” [“The promise of big data,” HSPH News, Spring/Summer 2012]

Winston Hide, an associate professor of bioinformatics at Harvard’s School of Public Health, told Grant, “In the last five years, more scientific data has been generated than in the entire history of mankind. You can imagine what’s going to happen in the next five.” Grant goes on to point out, “This data isn’t simply linear; genetics and proteomics, to name just two fields of study, generate high-dimensional data, which is fundamentally different in scale. … In big data lies the potential for revolutionizing, well, everything.” She continues:

“The potential public health uses of big data extend well beyond genomics. Environmental scientists are capturing huge quantities of air quality data from polluted areas and attempting to match it with equally bulky health care datasets for insights into respiratory disease. Epidemiologists are gathering information on social and sexual networks to better pinpoint the spread of disease and even create early warning systems. Comparative-effectiveness researchers are combing government and clinical databases for proof of the best, most cost-effective treatments for hundreds of conditions—information that could transform health care policy. And disease researchers now have access to human genetic data and genomic databases of millions of bacteria—data they can combine to study treatment outcomes. According to McKinsey & Company, with the right tools, big data could be worth $9 billion to U.S. public health surveillance alone and $300 billion to American health care in general, the former by improving detection of and response to infectious disease outbreaks, and the latter largely through reductions in expenditures.”

Returning to the latest studies involving endometrial cancers. Kolata reports, “The cancer has long been evaluated by pathologists who examine thin slices of endometrial tumors under a microscope and put them in one of two broad categories. But the method is not ideal. In general, one category predicts a good prognosis and tumors that could be treated with surgery and radiation, while the other holds a poorer prognosis and requires chemotherapy after surgery. But pathologists often disagree about how to classify the tumors and can find it difficult to distinguish between the two types.” Thanks to big data analysis, however, things are changing. Kolata explains:

“The new genetic analysis of hundreds of tumors found patterns of genetic aberrations that more precisely classify the tumors, dividing them into four distinct groups. About 10 percent of tumors that had seemed easily treated with the old type of exam now appear to be more deadly according to the genetic analysis and would require chemotherapy. Another finding was that many endometrial cancers had a mutation in a gene that had been seen before only in colon cancers. The mutation disables a system for repairing DNA damage, resulting in 100 times more mutations than typically occur in cancer cells. … It turned out to be good news. Endometrial cancers with the mutation had better outcomes, perhaps because the accumulating DNA damage is devastating to cancer cells. Another surprise was that the worst endometrial tumors were so similar to the most lethal ovarian and breast cancers, raising the tantalizing possibility that the three deadly cancers might respond to the same drugs.”

Commenting on these findings, Jeff Boyd, executive director of the Cancer Genome Institute at Fox Chase Cancer Center, told Kolata that the fact “that cancers are more usefully classified by their gene mutations than by where they originate” is important, as is the fact that it can be “validated with real data.” Kolata goes to report that research involving leukemia has also benefited greatly through the use of big data analytics. She writes:

“While the genetics of endometrial cancer had gone largely unstudied until now, acute myeloid leukemia has been investigated for decades, in part because leukemia cells are so accessible. They are in the blood and bone marrow. Using microscopes and special staining methods, researchers had already discovered, for example, that chromosomes in these leukemia cells are often broken or hooked together in strange ways. They also knew that some chromosomal alterations were associated with a good prognosis, and others with a bad one. Patients with a good prognosis can usually be treated with chemotherapy alone while those with a worse prognosis need the expensive, difficult and risky treatment of last resort: a bone marrow transplant. It comes with a 10 percent death rate. … There was no good way to decide which treatment these patients needed. Some did well with chemotherapy; some did poorly. … The new study of 200 acute myeloid leukemias identified at least 260 genes that were mutated in at least 2 of the 200 leukemia samples, finding virtually all of the common genetic malfunctions that occur in it. Now researchers have a new foundation for assessing which cancers will be lethal unless the patient gets a risky bone marrow transplant and which can be treated with chemotherapy alone. … Knowing which genes are mutated also allows researchers to investigate drugs that target those genes. The next step will be for investigators to determine which mutations lead to good or bad outcomes.”

The challenge is not gathering data, but analyzing it. Grant notes, “Our ability to generate data far outstrips our ability to analyze it.” Researchers continue to find this frustrating. Grant explains:

“Most researchers agree that lives are lost every day that data sit in storage, untouched. The problems are vast and urgent. Consider just one example—recent news that a dozen Indian patients had contracted totally drug-resistant tuberculosis. ‘Even just a few people in Mumbai is a terrible danger sign,” says [Sarah Fortune, Melvin J. and Geraldine L. Glimcher Assistant Professor of Immunology and Infectious Diseases], because it could portend the rapid spread of a highly transmissible and untreatable infection. To counter these trends, some scientists are venturing into crowdsourcing. Others are developing sophisticated algorithms to parse data in a keystroke. And still more are inventing ways to share massive, disparate datasets to yield surprising insights.”

With so much valuable data sitting fallow, gaining access to it so that it can be analyzed should be a priority. Systems need to be developed that can harmonize incompatible data, discover hidden relations, and provide actionable insights to medical professionals. Grant writes that working with large data sets “may feel tedious,” but, she correctly asserts, “It’s transformative.” In fact, she calls the utilization of big data analytics a “public health’s data driven revolution.” As Professor Hide told her, “It’s just the beginning. You should watch this space.”

On the Road to AI Superintelligence

New knowledge is being generated at such a dramatic rate that humans can no longer be expected to absorb and understand it. Pippa Malmgren, Founder

The Rise of A.I. Is Not Like the Dotcom Bubble

Nearly three decades ago, the world experienced what became known as the dotcom bubble. Many of the start-ups that popped up during that time raised