The television show Star Trek gave us much to look forward to: teleporting, food “replicators” and that other far-fetched creation – the data scientist. The show also introduced us to Data, an android who could access every piece of information ever generated, while Spock himself wasn’t just a pointy-eared Vulcan, but the logic-loving prototype for a role that taps into the power of information in unprecedented ways.
The idea that people in leadership roles should specialize in the organization, visualization and translation of vast swathes of data is no longer limited to sci-fi buffs. Today, data scientists play a leading role in what to do at a fork in the road within organizations, says DJ Patil, Vice-President of Product at RelatelQ, who helped coin the phrase “data scientist” while at LinkedIn. “Companies need a Spock in the boardroom,” he adds.
Firms, both large and small, are heeding this suggestion, as they continue to grasp the strategic importance of “Big Data”. This is no longer just a buzzword: data, data science and data analytics are all crucial tools for everything, from understanding customers to optimizing supply chains. As such, companies, governments and other institutions are vigorously investing in data science techniques and expertise.
Within the last 20 years, the focus has shifted from data collection to data use. What can we learn from the data? What is it telling us? How do we organize, access and visualize data in a way that steers strategic action? And how will this change in the future as data and the algorithms designed to comb them become more powerful?
How Google changed the world
These questions are just a few that will be addressed by global industry and government leaders at the World Economic Forum Annual Meeting of the New Champions in Tianjin, People’s Republic of China, next month. Jeremy Howard, a Forum Young Global Leader and CEO at Enlitic, has been involved in data science for two decades and has witnessed first-hand the seismic shift in how society perceives information gathering. “Twenty years ago there were few systems to collect and get information from data,” he says. Furthermore, “there were no systems and no strategy for use of data” within organizations.
But all this changed with Google, which served as a role model for how data science research could not only be better conducted, but also how specific kinds of data could be more efficiently collected, understood and used to help businesses, says Howard. In particular, the company developed a software technique for processing huge troves of data called MapReduce, which broke tasks into many smaller components to perform among multiple machines,” says Kenneth Cukier, Data Editor for The Economist. The open-source version of this novel process – that has become very widely adopted – is called Hadoop.
“Nowadays, hundreds of organizations have transformed their industries with data science,” says Howard. “There are more tools now to process the data and gain insight from it, and this is exciting.”
The repercussions of the data revolution are rippling out through industries and societies. “We are able to datafy things we could never render in a data format before,” adds Cukier. “We can do this because of the lower costs of collecting, storing and processing data in a way that is almost unimaginable in the past.” Self-driving cars are one striking example. “We transformed the problem from one of explicitly teaching a car how to drive, to feeding in lots of the data and having the car figure out what to do in different situations,” he says.
In another sphere, AidData, an innovation lab that aims to make development finance more transparent, provided detailed maps based on GPS data of where financial aid was being sent in regions in Africa. It revealed a distinctive “mismatch” in where authorities thought aid was going and where it actually ended up. “Jaws dropped,” says Cukier. “In Kenya it showed that for the tens of millions of dollars of international development assistance flowing into the country, there were sectors and regions that weren’t getting financial support.” The aid organizations hadn’t seen this before because the calculations and correlations just weren’t available in the past. Now, large data sets that previously didn’t exist, combined with new ways to analyse the data, have opened new channels of communication, action and capital.
One of the driving forces in this revolution is a new algorithmic approach that allows us to process information from myriad sources, including videos, images and even sounds, says Howard. “Deep learning” algorithms, which he predicts are about to transform everything, probably more than the internet, are a subgroup of algorithms that can be applied to various fields, such as facial recognition, automatic speech recognition, natural language processing, handwriting and audio/video signal recognition.
Although deep learning has been around in academic circles for 40 years, it had largely gone unnoticed, until a few years ago when a team of scientists from Stanford, Princeton and Columbia universities helped launch the ImageNet Large Scale Visual Recognition Challenge, in which competitors sought to design algorithms that could quickly identify objects in images. In 2012, a University of Toronto team of researchers used deep-learning algorithms to significantly improve accuracy at classifying objects in images, and the results from 2014’s competition demonstrate even more advances in accuracy. “Since then everybody’s dropping what they’re doing and working on this,” Howard says. Companies like Google, Baidu and Facebook are investing in this technology.
Cars, X-rays and the cosmos
But while deep-learning algorithms have the potential to be disruptive in the world of data science, the revolution is just beginning. In academia, it is already being introduced and used in new ways. For example, Howard recently met with Nobel Laureate astrophysicist Brian Schmidt, a professor at the Australian National University, to discuss how deep learning could be used to better understand the cosmos.
The implications beyond academia are wide-ranging. Autonomous cars, already on our radar, could recognize traffic signals and respond appropriately. Medical diagnostics could be refined as algorithms enable radiologists to better understand and interpret CT and X-ray scans. Skype is experimenting with speech recognition algorithms that could deliver real-time language translations between individuals and groups.
Although deep-learning still has a way to go before it is seen in everyday applications, there has already been a cultural shift within businesses to recognize this growing importance of data science.
A Spock in the boardroom
“For the first time, we have role models in industries that use big data [strategically],” says Howard. Where at times in the past big data experts may have been confined to academic institutions, more and more sector leaders, like Google and Wal-Mart, are investing in data infrastructure and data science expertise. More importantly, we are beginning to see data science experts serve as the company “Spock” on the board of directors. Some big companies and government agencies already have data scientists in their board rooms, says Patil. But the revolution is slow. “People in board rooms are there because they have domain expertise not data expertise,” adds Howard. “So it is hard for them to be trusting of a data-driven approach because that’s not how they did business in the past.”
So where is the disruption taking place? Where are the data scientists who are taking leadership roles at the C-Suite level and higher? Not surprisingly, they are found in the nimble universe of start-ups. But the big guys have their eyes open, says Howard, about the value of having data scientists in positions of influence.
The news business is one sector that has been already fundamentally disrupted by data science. Journalism used to be more of an anecdotal endeavour, according to Cukier. Something would happen, and a writer would go and interview the person and/or report on a trend related to the occurrence. Data would back up the story. “We were victims of our observation, which is great if we were omnipresent,” he says.
But we’ve experienced a 180 degree change. “Now, data is the basis of the story and the anecdote provides colour and timbre,” he says. “Journalism changed to data journalism because we can now tell stories based on data.” There is more information available and more algorithms to help extract value from the data. And thanks to better imaging programmes, we can also now present stories more visually, by including charts and graphs and pictorial translations of the data.
Nate Silver, founder of FiveThirtyEight.com, has been leading the charge in this field. He went from using data to predict baseball games to the 2008 US presidential election, and now runs a team of journalists, designers, multimedia experts, quantitative analysts and database evangelists who are proving that people are hungry for news and hard data.
Great data, great responsibility
There are many reasons to be optimistic about the potential for big data to improve the state of the world. From geotags on Instagram posts to track unfolding political or humanitarian crises, to networks of sensors that help to tend large-scale crops, to biometric identification used to make sure voting is fair, data points towards a world of greater understanding, openness and accessibility.
“When data is available to everyone, you create a more robust and intellectually honest environment,” says Patil. The advent of electronic medical records (EMRs) could revolutionize medical science, as patterns of diseases and diagnoses will be more traceable. “We have the algorithms to save millions of lives,” stresses Howard, “but only if we dismantle data silos”. Because of security, safety and privacy concerns, “no one is saying let’s make this data available to everyone,” says Howard. “But I would like to see us do a better job at communicating the benefit of data sharing.”
Indeed, as with any innovation, there is a dark side: the recent NSA scandal raised thorny questions about information gathering and privacy. With the great power that data provides comes great responsibility.
“We can’t be willy-nilly about data,” cautions Patil. But data can help us shed light on problems, and then potentially solve them. And judgement can never be entirely replaced by an algorithm: you have to know when to ignore data, such as if your GPS tells you to turn right and drive over that cliff. “Data informs our judgment, but our gut helps us too,” says Patil. And that is something even Spock would admit is only logical.
Author: Alaina G. Levine is a science writer and author.
Image: A woman presents an Anatomic Symbolic Mapper Engine (ASME) from IBM Healthcare, at the IBM exhibit at the upcoming CeBIT fair in Hanover March 3, 2008. REUTERS/Hannibal Hanschke