Big Data and Ethics

Stephen DeAngelis

December 13, 2016

“When Facebook researchers manipulated how stories appear in News Feeds for a mood contagion study in 2014,” writes Sarah Zhang, “people were really pissed. … Outrage has a way of shaping ethical boundaries. We learn from mistakes.”[1] The shocking thing, she notes, is that researchers “never anticipated public outrage.” Although I believe the outrage was warranted, researchers probably felt justified in their actions because major news outlets have always shown a bias in how they cover the news (just look at the differences in how The New York Times and The Wall Street Journal covered the 2016 presidential race). “Unlike medical research, which has been shaped by decades of clinical trials,” Zhang explains, “the risks — and rewards — of analyzing big, semi-public databases are just beginning to become clear.” She believes the Facebook study “shows just how untested the ethics of this new field of research is.”

Big Data, Artificial Intelligence, and Ethics

Analyzing today’s enormous data sets — often referred to as “Big Data” — requires the use of algorithms that power artificial intelligence (AI) systems. Big data analytics and AI are inextricably bound together. “Whether you believe the buzz about artificial intelligence is merely hype or that the technology represents the future,” writes Jonathan Vanian (@JonathanVanian), “something undeniable is happening.”[2] He notes researchers are using big data and artificial intelligence to solve difficult problems that have challenged people for decades. Although solving problems is a good thing, he rhetorically asks, “What could possibly go wrong?” Like Zhang, Vanian believes the lack of ethics could lead to unforeseen consequences. As an example, he writes, “Consider an AI system that’s designed to make the best stock trades but has no moral code to keep it from doing something illegal.”

Paul Daugherty (@pauldaugh), Chief Technology Officer at Accenture, asserts, “Success in the digital age requires a new kind of diligence in how companies gather and use data.”[3] He adds, “In today’s digital age, data is the primary form of currency. Simply put: Data equals information equals insights equals power. Technology is advancing at an unprecedented rate — along with data creation and collection. But where should the line be drawn? Where do basic principles come into play to consider the potential harm from data’s use? … While digital advancements enable new opportunities for businesses to compete and thrive, they also create increased exposure to systemic risks. Digital trust … is very difficult for businesses to build with customers, but very easy to lose.” The problem is not with the data. Data in and of itself has no moral dimension. How that data is used does have a moral and ethical dimension.

Nicholas Thompson (@__icarus), founder of Grit, observes, “Data is just that — data. Just as the news interprets data, so does every application to determine how to use it. The same data that allows stores to intelligently sell you what you really want may also allow a government to infringe upon your rights. The data is just data, but how it is used determines morality. ‘1984’ was written before the means were known, but human nature has remained the same.”[3] Clearly, ensuring big data analytics are used ethically is important; but, Dave da Silva, Senior Data Scientist at Capgemini, believes too many companies have abandoned ethics. He encourages “Data Scientists (and those who commission or manage their work) [to] stop and think about their latest project in ethical terms as opposed to just technical.”[5]

Although we would like to think the analysis of big data could help us discover ground truths, da Silva asserts we are only fooling ourselves if we believe that premise. “Users need to embrace uncertainty and risk,” he writes, “using Data Science as a tool to reduce uncertainty (to aid decision-making) rather than claiming to generate absolute conclusions.” He also wants analysts to understand, “Decisions and assumptions are made at every step in your analysis. The big ones are recorded and surfaced to stakeholders but it’s impossible to ensure every assumption you make is 100% scrutinized. It gets worse; everyone who has touched the data, hardware or software has also made assumptions.” And those assumptions may have ethical consequences.

Ethical Analysis

Da Silva writes, “Long-term, Data Science training should provide more focus on if you should conduct analysis as opposed to just how to conduct analysis (i.e., the ethical dimension).” Unfortunately, he leaves us wondering exactly what should be included in ethical data science training. Companies, as well as individuals, should be concerned about the ethics of their analysis. There have been, and will continue to be, legal consequences when ethics are breached. Recognizing a gap exists in the area of ethics in the computer age, the international legal firm K&L Gates LLP has donated $10 million to Carnegie Mellon University to study ethical and policy issues surrounding artificial intelligence and other computing technologies.[6] Explaining why his firm made this gift, Peter J. Kalis, K&L Gates chairman and global managing partner, stated, “As a society, our ethical choices in this field will greatly influence what kind of world we will have. Its values. Its culture. Its laws. And, ultimately, its humanity.”

Daugherty adds, “While data ethics is a new area for most businesses, it must be a key consideration as organizations evaluate starting or continuing their digital transformation journeys.” He proffers four principles he believes will foster better big data ethics in companies:

1. Focus on data ethics throughout the supply chain. Businesses must handle data in an ethical way throughout their data supply chains — from collection, aggregation, sharing, and analysis to monetization, storage, and disposal. In doing so, organizations can create an environment of trust and accountability with every stakeholder relationship they have. …

2. Fundamentally change how data is viewed within your business. While a focus on security (is the confidentiality, integrity, and availability of data adequately protected?) and privacy (do controls on data satisfy regulatory requirements?) remains relevant, it’s critical to add lenses for ethics and trust related to data collection, manipulation, and use. To do so, organizations should change their perception of data as just information to one that recognizes data as sensitive and personal. Organizations must recognize the potential for negative use of data if clear standards of ethics and trust are not implemented throughout each business process.

3. Develop a set of best practices. With best practices, businesses can embed ethical considerations at each stage of product development, service delivery, and the data supply chain. Establishing a companywide code of ethics helps define the types of questions and concerns managers should be raising at each stage of project management and the service delivery lifecycle. …

4. Create a universal ethics language. A collectively recognized and accepted taxonomy around data ethics is needed. A common language can provide clarity to all parties involved in the exchange of data.

Daugherty concludes, “In today’s digital marketplace, the value of trust is measured by the bottom line. Companies with high trust quotients will gain brand loyalty that allows them to thrive. Those that commit breaches of trust will find themselves encumbered with brand discrimination that can be all but impossible to shed. Leading companies understand the true value of trust and the work it takes to achieve it. The next step is ensuring it is retained by actively including ethical practices into every facet of the organization.”

Summary

Paulo Marques (@pjpmarques), CTO and Co-Founder at Feedzai, believes there are some built-in ethical dimensions to some artificial intelligence systems. He explains, “The concerns consumers have over data privacy and giving companies access to their personal data may be assuaged with new artificially intelligent systems. These systems can analyze big data (without sharing it) and, at last, break the deadlock between consumers’ fears about data privacy and business’ need for more data/better personalization.”[7] Technology alone, however, will not ensure that data is used ethically — that requires ethical people and/or oversight to ensure the ethical use of data.

Footnotes
[1] Sarah Zhang, “Scientists are Just as Confused about the Ethics of Big Data Research as You,” Wired, 20 May 2016.
[2] Jonathan Vanian, “Why Artificial Intelligence Needs Some Sort of Moral Code,” Fortune, 2 September 2016.
[3] Paul Daugherty, “Achieving Trust Through Data Ethics,” MIT Sloan Management Review, 21 September 2016.
[4] Forbes Technology Council, “What Every Consumer Should Know About Big Data,” Forbes, 2 November 2016.
[5] Dave da Silva, “10 reasons* why Data Science lacks ethics and how to retrofit morality,” Insights & Data Blog, 24 October 2016.
[6] Press Release, “Carnegie Mellon Receives $10 Million from K&L Gates to Study Ethical Issues Posed by Artificial Intelligence,” Carnegie Mellon University, 2 November 2016.
[7] Op. cit., Forbes Technology Council.

On the Road to AI Superintelligence

New knowledge is being generated at such a dramatic rate that humans can no longer be expected to absorb and understand it. Pippa Malmgren, Founder

The Rise of A.I. Is Not Like the Dotcom Bubble

Nearly three decades ago, the world experienced what became known as the dotcom bubble. Many of the start-ups that popped up during that time raised