We live in an extraordinary time. The capacity to generate and to store data has reached dizzying proportions. What lies within that data represents the chance for this generation to solve its most pressing problems – from disease and climate change, to healthcare and customer understanding.
The magnitude of the opportunity is defined by the magnitude of the data that is created – and it is astonishing.
The world’s internet population grew by more than 750% in the past 15 years to over 3 billion and will pass the 50% penetration mark in the near future. This population shares more than 2.5 million pieces of content on Facebook, tweets more than 300,000 times and sends more than 204 million text messages – every minute.
Further, the acceleration in data growth will accelerate dramatically in the coming years as the Internet of Things takes hold, connecting 20 to 30 billion “things” by 2020. These “things” will transmit data on everything from your car, to your thermostat, to the health of your cattle herd.
Underpinning this explosion are extraordinary advances in data storage technology and architecture. Quality-adjusted prices for data storage equipment fell at an average annual rate of nearly 30% from 2002 to 2014. With an incremental cost to store data effectively at zero, institutions have responded by capturing everything possible – with the idea that what lies within will produce meaningful value for the enterprise.
Despite the technical advances in collection and storage, knowledge generation lags. This is a function of how organizations approach their data, how they conduct analyses, how they automate learning through machine intelligence.
At its heart, it is a mathematical problem. For any dataset the total number of possible hypotheses/queries is exponential in the size of the data. Exponential functions are difficult enough for humans to comprehend; however, to further complicate matters, the size of the data itself is growing exponentially, and is about to hit another inflection point as the Internet of Things kicks in.
What that means is that we are facing double exponential growth in the number of questions that we can ask of our data. If we choose the same approaches that have served us over time – iteratively asking questions of the data until we get the right answer – we will have lost out on opportunity to grasp our generational opportunity.
There are not, and will not ever be enough data scientists in the world to be successful in this approach. We cannot arm enough citizen data scientists with new software to be successful in this approach. Software that makes question asking or hypothesis development more accessible or more efficient miss the central premise that they will only fall further behind as new data becomes available each millisecond.
To truly unlock the value that lies within our data we need to turn our attention to the data, setting aside the questions for later. This too, turns out to be a mathematical problem. Data, it turns out, has shape. That shape has meaning. The shape of data tells you everything you need to know about your data from its obvious features to its secret secrets.
We understand that regression produces lines.
We know that customer segmentation produces groups.
We know that economic growth and interest rates have a cyclical nature (diseases like malaria have this shape too).
By knowing the shape and where we are in the shape, we vastly improve our understanding of where we are, where we have been and perhaps more importantly, what might happen next. In understanding the shape of data we understand every feature of the dataset, immediately grasping what it is important in the data, thus dramatically reducing the number of questions to ask and accelerating the discovery process.
By changing our thinking – and starting with the shape of the data, not a series of questions (which very often come with significant biases) – we can extract knowledge from these rapidly growing, massive and complex datasets.
The knowledge that lies hidden within electronic medical records, billing records and clinical records is enough to transform how we deliver healthcare and how we treat diseases. The knowledge that lies within the massive data stores of governments, universities and other institutions will illuminate the conversation on climate change and point the way to answers on what we need to do to protect the planet for future generations. The knowledge that is obscured by web, transaction, CRM, social and other data will inform a clearer, more meaningful picture of the customer and will, in turn define the optimal way to interact.
This is the opportunity for our generation to turn data into knowledge. To get there will require a different approach, but one with the ability to impact the entirety of humankind.
Full details on all of the Technology Pioneers 2015 can be found here.
Author: Gurjeet Singh is the Co-Founder and Chief Executive Officer of Ayasdi, a World Economic Forum Technology Pioneer.
Image: Visitors sit on a bench made in the shape of “Big Data” outside the venue of the 2015 Big Data Expo in Guiyang, Guizhou province, China, May 26, 2015. REUTERS/Paul Carsten