Solving the big problems of the world requires cooperation and collaboration. From NGOs to big business to government, each of us owns a piece of the data puzzle, and each is affected by the data-related actions of others. But how do we bring those multiple perspectives together to form the picture that allows breakthrough insights? Global problems such as climate change, financial market volatility or infectious disease won’t be solved until the puzzle pieces are joined, but the very act of moving the data into places where it can be used as valuable information carries risks – including the potential for the data to fall into the wrong hands. If data about population movement were to be stolen by a human trafficking ring, for instance, it could expose vulnerable refugees to exploitation. Such risks are real, and should not be underestimated.
However, the rewards are even greater. And this is part of my role as chair of the Global Agenda Council on Data-Driven Development – to advise decision-makers on how to balance the risks and rewards of using data for good. As we head towards the meeting of the UN General Assembly, the time is right to discuss the best way to use data for development.
Data for good
So what rewards have I seen? The power of shared data is apparent in a project involving my Global Agenda Council colleague, Nicolas de Cordes of Orange. Known as the Data for Development Challenge, the project made anonymized data extracted from Senegal’s mobile telecom network available to international research laboratories in a competition to find the best ways to aid development in Senegal.
As a follow-up to the challenge, there are currently six projects started on the ground in Senegal to build models and indicators. The Ministry of Health will use them to combat epidemics like malaria and schistosomiasis, the Ministry of Agriculture for food security alerts, and the National Statistic Agency to develop proxies for poverty and literacy indexes.
But that’s not all. Two of my Council cohorts, Jake Kendall of the Gates Foundation and Cameron Kerry of the MIT Media Lab and the Brookings Institution, are extending the work. In their paper, Enabling humanitarian use of mobile phone data, Jake and Cameron talk about how to anonymize the data so the value can be extracted without re-identifying the people behind it. Insights such as these have helped shape the global dialogue on data privacy and protection initiated by UN Global Pulse (under the leadership of Council member Robert Kirkpatrick).
But no matter how careful we are, moving and using the data does involve some exposure to potential liability. Another Council colleague, Scott David of the University of Washington’s Center for Information Assurance and Cybersecurity, is working on a new approach. He wants to move away from the divisive notion of assigning “liability” for data-related harms toward a shared-risk “commons” that can support scaleable, distributed and accountable structures of shared risk, cost and benefit.
As Scott explains, one party’s insight is another party’s potential intrusion. Unfortunately, we’re still working under anachronistic laws and practices that were intended to curb harms of a different type and a different time. It isn’t the 1970s any more. Balancing the benefits and burdens of distributed, networked information systems will require voluntary self-binding to standard rules, but the new structures will effect much greater good than the individuals and institutions can achieve alone. In this approach, data becomes a global utility, distributed information becomes a service, and liability morphs into sharing the costs of a shared infrastructure.
Moving the algorithm
So to truly minimize the risks, perhaps the missing puzzle piece is to do things differently. Instead of moving the data, leave it at the source, and move the algorithm instead. (I’ve written on this topic before.) Algorithms are defined here as the question being asked of the data as well as instructions on how the data is used and managed. They’re infinitely safer to transport, but we aren’t yet proficient in this approach. So while our Council has made a great deal of progress on how to move the data, the next big thing we are looking at is how to move the algorithms.
One promising avenue is the Enigma system, the subject of exciting work at MIT by my Global Agenda Council co-chair Sandy Pentland. His jointly authored paper, Enigma: Decentralized Computation Platform with Guaranteed Privacy, explains how a peer-to-peer network can keep data completely private by removing the need for a trusted third party. Enigma operates like Bitcoin, using an external blockchain as a network controller. The public nature of the blockchain provides transparency, enabling interested parties to collaboratively store and run computations on data while maintaining data privacy. As a completely new way to work with data, Enigma has huge implications for solving humanitarian and social problems.
Enigma is an example of “moving the algorithms” by means of technology. Global information systems depend on the reliability of technology and people. We make the technology reliable by moving the algorithm. We keep people and institutions reliable with the behavioral equivalent of the algorithm, i.e., standard policy and rules (and normalized rule-making processes). Moving the legal algorithm to wherever the data action occurs informs the data actions of well-meaning actors, and empowers them to engage in a form of neighborhood watch to help curb cyber bad actors.
Data is created at the edge, so if we leave the data at its source we will improve our ability to listen and learn more effectively at the edge. We can move the technical and legal algorithms to cover the data in use, achieving distribution and scale to support big data that respects individual and institutional interests.
The larger landscape
In weighing the pros and cons of each approach, it may help to consider the data landscape as a quadrant, where the x-axis represents data size and the y-axis represents analytic capability.
Today, we are operating mainly in the bottom right. Moving the data falls here. Enigma can move the data and the algorithm. That’s especially helpful when we’ve entered the top-left quadrant.
The Global Agenda Council on Data-Driven Development has made solid progress on being able to share data for the public good. Now we are working on the liability and risks of moving data. By moving the algorithm and not the data, we can further increase our impact and reduce our liability.
In our view, the future of data analysis should be distributed. We don’t have to ship data across borders to work with it. Instead, let’s give the algorithms a suitcase and send them on their way. Once they’re out there, moving about the world, they have the potential to do an immense amount of good.
A final thought: as data makes its way into the human development arena, let’s not limit ourselves to the approach dictated by current technology, based on a trusted central entity. Let’s expand our thinking to consider how we can ensure the human right to data transparency. Does the right to know include only what data is being used, where it resides, and what technology is used to secure and manage it? Or is it something bigger? Perhaps the right to know includes what’s being done with the data about you. What are the questions being asked of the data, and for what purpose?
This is essential to consider as we explore how best to use data to further the cause of human development.
Author: Mikael Hagstrom is chair of the Forum’s Global Agenda Council on Data-Driven Development.
Image: Employees of IT Company CCP work at the company’s head office in Reykjavik February 13, 2013. REUTERS/Stoyan Nenov