The number of miles driven has always been a critical statistic for autonomous vehicles (AV): more miles equals more learning data. Today’s AV developers have an insatiable hunger for data: the more they drive, the more their self-driving software becomes capable of driving.

This data has always represented a key competitive advantage for AV developers - which is why virtually nothing has been shared beyond the walls of their (mostly) Silicon Valley offices.

That is, until now.

From closed doors to open datasets

Over the past year and a half, various leading developers of self-driving technology have started to release their road-driving datasets to the public. Prior to this, due to the resource intensity of data collection, only very few small datasets were available to the academic community.

Releasing open datasets is a good way to give access to researchers in academia to experiment and research on self-driving technology, and to kick-start innovation in this field. What is being overlooked, however, is the fact that data sharing could be fundamental in achieving the holy grail of self-driving vehicles: safety.

Promoting collaboration and data sharing among stakeholders is the basis of adequate safety standard development and one of the core focus points of the National Highway Traffic Safety Administration’s voluntary guidance for autonomous vehicles. Data sharing and safety go hand in hand - and here’s why, starting with two assumptions.

First, safety is the number-one objective for society when it comes to self-driving cars. This technology promises to cut to (ideally) zero the 40,000 deaths that occur every year on US roads alone, and safety must be guaranteed before we can see a mass rollout of AVs.

Second, what we want to encourage the sharing of at a large scale, first and foremost, are driving scenarios. A driving scenario can be understood as the narrative of a situation that occurs during driving, together with all of its boundary conditions and environment specifications; for example, a situation in which a car cuts off another one on a particular chunk of the highway, at a certain speed and under intense rain.

Sharing is caring

Sharing these kinds of scenarios should not be particularly controversial, because it is data that is not deeply linked to any company’s intellectual property or core technology stack. It simply takes time and effort to collect.

But sharing this data on a large scale will generate positive externalities, while a refusal to share will have the opposite effect. Collectively, the industry would benefit enormously from tapping into a large pool of shared safety data and driving scenarios, since each company would have the chance to train and validate their proprietary pieces of software on a wider, richer and bigger dataset. On the other hand, keeping data siloed and proprietary would make each company less capable of dealing with peculiar situations generated by their unique data gaps. This is why these firms have always been very protective of their data: behind the excuse of ‘trade secrecy’ lies a desire to inhibit the competition’s understanding of the world.

Being first to market is a huge advantage, of course. But ultimately, for automated driving systems (ADS) to be successful, there will need to be a set of safety standards that all manufacturers must meet before they are allowed to put their vehicles on the roads. Competitive advantages will have to be found, we believe, in design, customer experience and business models - but not basic safety.

Another externality to take into account is that training algorithms using a shared pool of data could improve the ability of different ADS to work with each other. In dangerous or complex situations, having a different baseline understanding of the world, and a different language to describe it, can lead to different interpretations - which ultimately increases the complexity of solving a situation for each manufacturer, while decreasing the overall safety of the technology.

A majority of US pedestrians remain unpersuaded that self-driving cars are safe
A majority of US pedestrians remain unpersuaded that self-driving cars are safe
Image: Statista, YouGov

Finally, building a shared, comprehensive pool of driving scenarios is the first step towards the development of reliable, transparent and robust safety standards for AV, which will then pave the way for regulations. It is nearly impossible to come up with a meaningful safety threshold to be met if there is not a common and shared medium on which to establish a ‘measurement scale’. Regulators, policy-makers and authorities need to understand what is really happening on the roads and base their decisions on real data; it’s better if this is unified, organized and orchestrated by third-party organizations.

Last but not least, the creation of a shared pool of data would be a great step towards transparency that could have a significant impact on public trust and acceptance of self-driving vehicles. Given the current lack of overall trust in autonomous technology, this is a clear and pressing need.

The path towards a Safety Pool of shared data

Orchestrating such a data-sharing effort involves technical and operational challenges, such as identifying a common scenario-description language, or deciding on a common taxonomy to use as a standard description of the world, and setting up the right incentive-scheme to maximize benefits for all the participants.

At Deepen AI we have have been working hard on a plan to tackle those issues and make this shared database of safety data and driving scenarios a reality. Thanks to the Data for AV Safety Project we are co-leading with the World Economic Forum, we are launching Safety Pool, an incentive-based brokerage of driving scenarios shared among the major stakeholders of self-driving technologies. We are going to release more details over the next weeks - but in the meantime, join us in our mission towards safe, reliable and robust autonomous driving.

Collaboration among private companies and other stakeholders is probably more important in our industry than ever. Prominent actors such as Cruise or Daimler have already deferred their deadlines of AV rollout due to the significant technical challenges that still exist. Engineers and industry leaders have started to recognize that solving autonomy may require collaboration well beyond the boundaries of individual firms. Furthermore, given the accidents and deaths that have hit the self-driving industry in recent years, attention on safety is greater than ever, and there is a shared agreement that safety should be prioritized above anything else.

The time has come as an industry to work cooperatively towards safer, more convenient transportation. Sharing data is the first, fundamental step towards achieving this goal.