How big data helps transport planning

Shomik Mehndiratta
Share:
The Big Picture
Explore and monitor how Data Science is affecting economies, industries and global issues
A hand holding a looking glass by a lake
Crowdsource Innovation
Get involved with our crowdsourced digital platform to deliver impact at scale
Stay up to date:

Data Science

Understanding peoples’ travel activity patterns, and ideally understanding the motivations and choices underlying them, are at the heart of what transport planners do.

An understanding of trip origins and destinations – and how trip-makers select routes, modes and destinations – are required to plan extensions or changes to a road or public transport network; to assess the viability of a new investment; and to assess how well the existing transport system is serving the population and businesses in a specific area.

​Indeed, in our roles as transport specialists in the World Bank, much of our job is about supporting clients’ ability to develop this understanding and to use the results to evaluate and appraise investments.

At the core of this process is an Origin-Destination (OD) survey: essentially a matrix of trips between different zones of a region (referred to as an OD matrix).  Traditionally, getting this information in the context of an urban area has been a difficult, expensive and time-consuming process. We are often talking about millions of dollars for trip activity surveys of thousands of households, complemented by extensive analysis of socio-economic data. We also count data at strategic points on major roadways and transit routes to calibrate the results.

This process can take up to a year, and many stages need very specific technical skills and a lot of quality control.  Survey design, sample design, training the surveyors, ensuring they are accurate (not making up data, not entering data erroneously), and subsequent stages of analysis all require significant technical capacity to implement, as well as an almost equal level of technical skill to supervise the work.

All of us who do this have horror stories from processes on which we have worked. The result is that the basic information needed to test alternatives and make decisions about transport investments is collected too rarely – at best no more than once every decade – and even the results of existing surveys have suspect deficiencies.

Today, a new and unlikely source of ‘Big Data’ may be coming to the rescue: Call Data Records (CDR). These records belong to the increasing set of ‘passive data’ information sources that are generated by daily transactions performed by ordinary people. Every time a call is made on a mobile phone, the mobile tower associated with that call – usually the nearest mobile tower – is recorded, originally for billing purposes.  These records provide the location, time and duration of every call, generating huge databases that have the potential to produce an OD matrix at a fraction of the cost of traditional methods.

We are working with Professor Marta Gonzalez, a researcher at the Massachusetts Institute of Technology (MIT), who has been developing protocols to analyze CDR data to generate OD matrices in Brazil’s Rio de Janeiro Metropolitan Area.  The attached paper presents some preliminary results comparing for Rio de Janeiro and Boston, Massachusetts (where the MIT team was already working) that suggest that CDRs could become a valuable tool for transport planners in the coming years.

In terms of data we find:

  • CDR data is available for three million people, about 25 percent of the population of the Rio de Janeiro metropolitan area, for a five-month period. For the OD surveys, we use a smaller sample, after eliminating users with too few or too many calls. The ensuing sample of 500,000 users is still huge compared to typical household surveys. For comparison, a traditional survey that the World Bank financed in Rio de Janeiro and Sao Paulo at the same time included a survey of 10,000 to 30,000 households.
  • The density and distribution of cell towers very accurately mirrors the granularity and needs of transport planning.  Our data set covers 1,421 towers in Brazil’s Rio state.  This compares well with the size and distribution of ‘travel analysis zones,’ the unit of analysis for traditional planning – in Rio, the traditional model uses 730 zones in the metropolitan area.

In terms of results, the good news is that the preliminary analysis suggests a good correlation between the traditional and CDR-based result, as well as a good match in total numbers for home-based work (i.e. commute) trips, which often determine the viability of mass transit investments. However, for other kinds of trips – and the total number of trips –CDR data currently suggests significantly higher trips than traditional methods. We are currently examining alternative explanations and looking to refine the analysis in the coming months.

It is important to note that, even if these results pan out, CDRs will always need to be complemented by many elements of traditional planning. This data does not provide insightful information on the underlying choices and motivations of individuals’ decisions, which are critical to designing a viable new investment.

There are also other concerns. While the size of the sample is very large, there are some elements of systemic self-selection, and segments that do not use mobile technology are left out. Also, we have data from only one mobile carrier and, if there are systematic preferences across different segments of the population for carriers, then those preferences could affect the accuracy of the final results.

Finally, there is an active ongoing debate related to privacy concerns for the use of CDR data.  Though the data we are using is anonymized, and is provided for very specific research uses, commentators have raised concerns about the potential for abuse with such data. As a result, resolving these concerns in a satisfactory manner will be critical before the use of CDRs can be mainstreamed.

That said, our experience illustrates how we are entering the “Big Data” era, with a range of possibilities for serendipitous second uses for data. Combined with the analytic power of the new generation of computing, this opportunity provides the potential to significantly disrupt long-established (and sometimes outdated) ways of doing business across the development world.

This article was first published by The World Bank’s Transport for Development blog. Publication does not imply endorsement of views by the World Economic Forum.

To keep up with Forum:Agenda subscribe to our weekly newsletter.

Author: Shomik Raj Mehndiratta is a Lead Transport Specialist working in the World Bank’s Latin America and Caribbean region based in Washington DC.

Image: The Metro de Lima electric train travels on a bridge in the San Juan de Lurigancho district of Lima, September 2, 2014. REUTERS.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Share:
World Economic Forum logo
Global Agenda

The Agenda Weekly

A weekly update of the most important issues driving the global agenda

Subscribe today

You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.

About Us

Events

Media

Partners & Members

  • Join Us

Language Editions

Privacy Policy & Terms of Service

© 2024 World Economic Forum