Biology and data science are on a collision course. Here’s what you need to know

Advances in biology and data science have transformative power, but how can we harness them?

Advances in biology and data science have transformative power, but how can we harness them? Image: National Human Genome Research Institute/Flickr

Kimmy Bettinger
Expert & Knowledge Communities Lead, C4IR
David Bray
Distinguished Fellow, The Henry L. Stimson Center
A hand holding a looking glass by a lake
Crowdsource Innovation
Get involved with our crowdsourced digital platform to deliver impact at scale

Listen to the article

  • The collision of biological science and data science has huge implications for communities around the world.
  • Near-term decisions and actions at this intersection will lay the foundation for our biological future.
  • Collective attention must focus on access, benefit sharing and agency to maximise the long-term value of biological data.

Two scientific worlds are colliding: biological science and data science. Converging advances in these worlds are decreasing the cost and time it takes to understand how living systems function; and providing the foundation for new medical treatments, sustainable energy solutions and more nutritious food. Decisions and actions communities take today will shape the next 30 to 40 years of both fields of science and the associated benefits that emerge at their intersection.

Have you read?

One future path leads to benefits being shared with the health of communities, plants, animals and the Earth’s global ecosystem improved. An alternative future path leads to benefits being held by the few, locked behind restrictive legal and digital barriers that preclude access to life-changing insights, new therapies and opportunities for a better future. Steering efforts in industry, government and everyday life towards the first path will require leadership that spans sectors.

Why should communities care about the convergence of bio and data?

The cost of sequencing the human genome has fallen dramatically in the past few decades and continues to lower. As a result, genome sequencing has increased but our ability to understand the information it contains has not kept pace. How do we make sense of this ever-growing volume of data?

Similar impressive advances have been made in data science, most notably in analytic capabilities and artificial intelligence (AI). An AI system that would’ve taken six minutes and $1,000 to train in 2018 required only 13 seconds and $5 to train three years later. As a result, the number of patent filings related to AI innovations in 2021 was 30 times greater than the filings in 2015.

Taken together, these advances in biology and data science have transformative power. In 2020, an AI called AlphaFold solved a 50-year-old problem of translating the chemical formulas for proteins into the three-dimensional shapes those proteins took. By optimizing the time required to reach these solutions, the model opened new paths for drug discovery and design, including research on antibiotic resistance, cancer, and countering COVID-19.

data science. The cost of sequencing the human genome has fallen dramatically in the past few decades.
The cost of sequencing the human genome has fallen dramatically in the past few decades Image: Our World in Data

What should communities care about in the convergence of bio and data science?

Over the next few years, we will need to make important decisions regarding the benefits associated with the convergence of biology and data science. Two useful frameworks exist to help us evaluate and act based on values that support and safeguard all people and the planet.

The first is that the value of data increases when it is shared, combined and reshaped. Several bad metaphors for data have taken hold in the last decade - ranging from “data is the new oil” to “data is the decisive advantage”. While these were useful in calling attention to a powerful new asset in the early 2000s, they mislead in multiple ways. Data sets are reusable resources. Unlike oil, data is reusable and non-finite. The more different communities use data, the better the data is refined and improved. We should want communities to use data and share in its benefits, not hoard it.

The second framework recognizes that biology is interconnected. Your DNA – the blueprint for how your body operates – is the product of your parents, their parents and their parents’ parents and through billions of years of evolution on Earth. Moreover, your DNA is shared with your siblings and cousins and will be shared with any child you or they have.

Your genes and the protein structures within your body are shared for both similar functions as well as completely different functions in other plant and animal life on Earth. You are made of building blocks that evolution has reused and repurposed in a multitude of ways ever since life first showed up on our planet an estimated 4 billion years ago. In summary, your DNA and other biological data sets are not solely your own. As a result your decision to share or not share the data sets affects not only yourself, but others within your family, the entire human species and the planet. This is different from other personal information.

The way companies, governments and the public have treated other advances in chemistry or computers will not work for the non-finite nature of data nor the interconnected nature of biology. Biological data breaks the mould in terms of existing global norms, laws and ways of operating. As such it is essential to place a focus on access, agency and benefit sharing.

How to ensure decisions result in shared benefits, access, and improvements

Within these two frameworks in mind, it becomes clear that stewards of biological data hold significant power. Investment in bio-based technologies, industries and economies is growing rapidly, but activity is concentrated in a few regions.

Building a shared, global commons of biological data like genomic sequencing information will improve access and expedite innovation. It will also be critical to make the knowledge, tools and technologies to take advantage of biological data available through partnerships, databases and repositories.

Building a global commons of biological data could follow some simple steps:

1. Identify one or more use cases where the sharing of biological data is associated with community needs and benefits.

2. Make involving communities in these use cases a priority goal and ensure any technical components that use biological data do so in ways that both benefit one or more communities and protect their data sets responsibly.

3. Involve community members in regular governance activities that ensure use of biological data both protects and benefits communities.

How to build a global commons of biological data science
How to build a global commons of biological data Image: World Economic Forum

This three-step process ensures community members are involved in every step of a global commons of biological data. Initial use cases could include biological data to help treat and cure rare diseases. Community members can help refine the use case to be practical and beneficial to them. Community members also can ensure that biological data provides benefits and is appropriately protected, consistent with the needs of the community, and that their governance processes regularly involve their inputs as a global commons matures.

In addition, such commons of biological data should adhere to data principles championed by indigenous people globally (including both the FAIR and CARE data principles) to increase both the equity and the positive impact of biological data.

At the 2022 UN Biodiversity Conference, nations of the world agreed to take legal, policy and capacity-building measures to ensure equitable sharing of benefits arising from digital sequencing information. While this is a big step in recognizing and respecting local resources and traditional knowledge, it complicates the existing regimes of Open Science and Open Access.

These are the resulting sparks of a collision and they require the collaboration of all relevant stakeholders – not just world leaders and scientific experts – in both discussing the challenges and developing solutions.

Collision theory states that when the right particles of a substance meet each other in the right orientation, you get a successful change. However, to ensure a collision is successful, we need to handle inputs with attention and care. Most importantly, we need activation energy. When it comes to the collision of biology and data science, the participation of communities will shape the next 30 to 40 years of both fields and the associated benefits that emerge at their intersection.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

World Economic Forum logo
Global Agenda

The Agenda Weekly

A weekly update of the most important issues driving the global agenda

Subscribe today

You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.

About Us



Partners & Members

  • Join Us

Language Editions

Privacy Policy & Terms of Service

© 2024 World Economic Forum