The “Big Data” Dialogues, Part 5: Algorithms

Stephen DeAngelis

October 27, 2011

In the prologue to the book entitled Algorithms, Professor Sanjoy Dasgupta, from the University of California – San Diego, and Professors Christos Papadimitriou and Umesh Vazirani, from the University of California at Berkeley, write:

“Look around you. Computers and networks are everywhere, enabling an intricate web of complex human activities: education, commerce, entertainment, research, manufacturing, health management, human communication, even war. Of the two main technological underpinnings of this amazing proliferation, one is obvious: the breathtaking pace with which advances in microelectronics and chip design have been bringing us faster and faster hardware. … The other intellectual enterprise that is crucially fueling the computer revolution [is] efficient algorithms.”

You simply can’t talk about “big data” without talking about algorithms. The good professors insist that the invention of algorithms did even more to advance humankind than the invention of the printing press. They continue:

“The decimal system, invented in India around AD 600, was a revolution in quantitative reasoning: using only 10 symbols, even very large numbers could be written down compactly, and arithmetic could be done efficiently on them by following elementary steps. Nonetheless these ideas took a long time to spread, hindered by traditional barriers of language, distance, and ignorance. The most influential medium of transmission turned out to be a textbook, written in Arabic in the ninth century by a man who lived in Baghdad. Al Khwarizmi laid out the basic methods for adding, multiplying, and dividing numbers even extracting square roots and calculating digits of [pi]. These procedures were precise, unambiguous, mechanical, efficient, correct — in short, they were algorithms, a term coined to honor the wise man after the decimal system was finally adopted in Europe, many centuries later.”

Until now, you might have thought that algorithms were named after the former Vice President, Al Gore, instead of a Muslim mathematician named Al Khwarizmi! (Just kidding of course.) In addition to paying homage to anonymous Indian mathematicians and Al Khwarizmi, Dasgupta, Papadimitriou, and Vazirani, pay tribute to one additional genius, Leonardo Fibonacci. They write:

“Al Khwarizmi’s work could not have gained a foothold in the West were it not for the efforts of one man: the 15th century Italian mathematician Leonardo Fibonacci, who saw the potential of the positional system and worked hard to develop it further and propagandize it. But today Fibonacci is most widely known for his famous sequence of numbers 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, … , each the sum of its two immediate predecessors. … No other sequence of numbers has been studied as extensively, or applied to more fields: biology, demography, art, architecture, music, to name just a few. And, together with the powers of 2, it is computer science’s favorite sequence.”

If you want to get into the math behind Fibonacci’s sequence or learn more about algorithms, go the Khan Academy and do a search on the subject about which you’re interested. You’ll be able to watch a short, easy to understand video on that subject. To learn more about the Khan Academy, read my post entitled Teaching Problem Solving Skills in Math and Science, Part 2. If you really want to immerse yourself in algorithms, buy the book written by Professors Dasgupta, Papadimitriou, and Vazirani.

 

The point of this historical discussion is that algorithms are becoming increasingly important in our lives as we get deeper into the information age. Why? The simple answer is because the human mind can’t rapidly deal with the oceans of data being generated every second of every day. Algorithms are being used to help us make sense of this data. Algorithms sound mysterious and complicated (and they certainly can be complicated); but, the simple definition of an algorithm is: A set of steps used to solve a mathematical computation. Seth Freeman, a writer for television, reminds us, “Algorithms, as you probably know, are the computer programs that infer from your profile (in the case of Facebook) and from the content of your e-mails (in the case of Gmail) your interests and preferences, enabling ads to be displayed to the customers most likely to be interested in specific products. This feature is prized by advertisers and accounts for the multibillion-dollar value of the most successful Web networks.” [“Me and My Algorithm,” New York Times, 17 January 2011]

 

If you know anything at all about big data, you know that companies like Google and Facebook generate billions and billions and billions of bytes of data. Data that, in order to be useful, needs to be analyzed. Sometimes this “analysis” can prove humorous. Freeman continues:

“The algorithms are programmed, I believe, to get to know us better over time, and rather than resent the invasion of privacy I have come to feel a grudging respect for, and even a growing sense of intimacy with, my own personal algorithm. You have to admire, for example, the inventive audacity of a program that would read an e-mail someone sent me about ‘Holocaust deniers’ and think that I might be shopping for a Holistic Dentist. And when I conceded in an e-mail that something ‘was cheeky of me …’ I found it rather endearing that the algorithm tried to sell me a New Razor from Gillette®. I had a similar reaction when a reference to the fine actor Christopher Plummer produced: Get a Plumbing Quote Now. Find a local Plumber. Of course, these slightly off-base pitches have a certain logic that is easy to discern, revealing, more than anything else, the program’s digital dyslexia.”

What we, as individuals, might find amusing, companies trying to improve their bottom lines find annoying. They pay good money to get the best analytics possible and quirky nuances of language can throw a wrench into such analysis. That is one reason that Enterra Solutions uses an ontology that understands linguistic nuances and can establish proper relations when sorting through mountains of data. The fact of the matter is, however, that the occasional glitch is likely to arise anytime you are dealing with unstructured data.

 

Simon Dell, director of TwoCents Group, an Australian marketing, advertising and branding company, believes that as a result of increased use of algorithms, “We’re in danger of losing the spontaneity in our lives. Well, at least our digital lives.” [“Algorithms want to rule the world,” posted by Peter Roper, Marketing Magazine, 10 October 2011] Dell explains:

“Google was built on the back of ‘I’m feeling lucky’, but now we’re slaves to what our social networks and our search engines want to tell us we should be looking at and who we should be connected to. We’re gradually collapsing in on ourselves on the premise that someone somewhere is trying to save us time. … Facebook decided to roll out some significant changes, including a removal of our option to switch between all stories and top news. Instead, our news feed has been decided for us, based on a complex algorithm and delivered to us as ‘top stories’. The tool that originally allowed us to filter meaningless news out of lives, and allow us to choose who and what we followed, has now come full circle and is choosing for us. And there’s no off button.”

Dell is not alone in his concern about how algorithms are trying to “rule the world.” If you have about 15 minutes to spare, I recommending watching the presentation that Kevin Slavin delivered at TEDGlobal earlier this year. The TED site states:

“Kevin Slavin argues that we’re living in a world designed for — and increasingly controlled by — algorithms. In this riveting talk from TEDGlobal, he shows how these complex computer programs determine: espionage tactics, stock prices, movie scripts, and architecture. And he warns that we are writing code we can’t understand, with implications we can’t control.”

 

Returning to Simon Dell’s article, he argues that there is “a shadowy figure that now stands over us: the once-innocent algorithm.” He continues:

“Google have been developing algorithms for years – it’s the basis for their entire business model – but as Eli Pariser revealed in his TED talk, those algorithms now deliver different search results based on who is doing the search. No longer did we all get the same search feed, but our location, age, sex and previous searching and browsing habits combine to deliver a result tailored just for us. … Many of you might shrug your shoulders and ask, ‘So what?’ It’s only Facebook and Google, and they’re not the be all and end all of marketing. Well, the use of algorithms to communicate to us will eventually evolve to other platforms. Let’s think about how we anticipate the future will evolve. Digital billboards that can detect who is walking past them. Cars that log us into our iTunes accounts when we start them up. Smartphone apps integrated with artificial intelligence programs anticipating us being late for meetings.”

Eli Pariser’s TED video takes less than 10 minutes to watch. His point is that algorithms can be used to filter, isolate, and present information in ways we may not like but have no control over. Dell, a marketer, is concerned that algorithms can be used in a very intrusive way to manipulate our lives. Some people seem happy about all this tailoring and even willingly offer up their habits to share with others. Others, of course, are concerned over privacy issues, ethical issues, and control issues.

 

For manufacturers and retailers, however, data has always been important. They want to filter data to get the information they need to make smart business decisions. Obviously, algorithms play an essential role. As I’ve pointed out in previous posts, there are still a number of companies that do most of their data-crunching on Microsoft Excel spreadsheets. Although Excel is a great program, it was never intended to crunch big data — and that’s a problem for large companies. It also explains why algorithms play an increasingly important role in supply chain management. Companies, like individuals, need to know what the algorithms employed in their behalf are doing. If they don’t, they might find themselves suffering consequences like those described in Slavin’s talk.

 

As CEO of a company that relies on algorithms to serve its clients, I’m obviously a big proponent of their use. The upside of using algorithms far outweighs any downside. Like any technology, however, computer embedded algorithms need to be understood and refined to deliver desired results. If the analysts and academics cited above are to be believed, we will soon leave the age of information and enter the age of algorithms.