Every year, millions of patients and families experience the emotional and financial burden of the seemingly unending “diagnostic odyssey” of a rare disease. In high-resource healthcare settings, an increasing percentage of these families will receive an accurate diagnosis thanks to genomic sequencing, but for many it will still be years of confusion, frustration and, at times, despair.

Just one other patient with the same mutation and phenotype can lead to the identification of a new disease. But what if that other patient resides on the other side of the earth? What if a patient lives in Canada and the only other person in the world with the same mutation and phenotype lives in Turkey? The only way to connect those two patients - indeed, the only way to give them both a diagnosis - is to connect the data sets in which their clinical and genomic information reside.

This is data federation and it could lead to a wealth of new rare disease diagnoses - if we get it right. A connected ecosystem of internationally distributed genomic and health related data sets would enable a virtual cohort that allows each jurisdiction to adhere to its own unique privacy, security and legal requirements. By allowing secure, streamlined access to all these repositories of data through a federated approach, we can speed up the process of finding that other patient, identifying the mutation responsible for a disease and determining a treatment.

Already, dozens of rare disease data sets have been established around the world, and more are being launched as new national clinical genomic data initiatives emerge every year. The Personalized Medicine Coalition found that since 2013, the governments of at least 14 countries have invested over $4 billion in establishing national genomic-medicine initiatives to address implementation barriers and transition testing from centers of excellence to mainstream medical practice.

We suspect that some rare disease patients are probably not so rare after all; they just need to be located and the mutations in their DNA sequences collectively linked and understood. There is great hope that decoding important genes and identifying how variations manifest phenotypically will lead to a better understanding of how people come to have certain diseases. That understanding will ease the hunt for new treatments and allow us to more accurately deploy existing ones.

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate the creation of this federated ecosystem. We believe that progress in genomic research and human health will require a common framework of standards and harmonized approaches to effective and responsible genomic and health-related data sharing.

GA4GH is a community, with representatives from a diverse cross-section of industry, academia, and healthcare all contributing to the development of technical standards and broad policy principles to ensure that this sensitive information is used in a responsible and equitable manner. One of GA4GH’s pioneering efforts in data federation was supporting the evolution of the Matchmaker Exchange, a platform for sharing rare disease data from individual research data “matchmakers”.

More broadly, the involvement of 500 international member organizations and the direct engagement of 23 “Driver Projects” - real-world genomic data initiatives that help develop and pilot its standards and frameworks - GA4GH is ensuring its work meets the actual needs of the real-world genomics community. With data repositories proliferating around the globe, federated data systems will allow researchers to ethically access sensitive information while simultaneously allowing institutions and governments to retain control.

Map of currently active government-funded national genomic-medicine initiatives
Image: Stark et, al., 2019

The World Economic Forum builds on the principles set forth by GA4GH and elucidates common governance challenges that federated data systems can help overcome in Federated Data Systems: Balancing Innovation and Trust in the Use of Sensitive Data. This white paper is part of a Breaking Barriers to Health Data pilot project demonstrating a four-country federated genomic data system for accelerating the research and diagnosis of rare diseases.

The Forum’s project is focused on co-designing and testing a governance framework that allows institutions in different countries to develop the trust in each other’s patient consent processes, data security protocols and intellectual property provisions, among many sensitive issues. Such a governance framework aims to scale to other groups that want to send cross-border queries between institutions in a federated system, while respecting and navigating key policies and regulations.

Genetic data are among the most sensitive types of personal health information; they also hold the potential to inform human health research and medicine in a way that no other data can. As such, they must be handled with great care while also being made available to the genomics and health community so that all patients - indeed, all humans - can benefit from the promise they hold.

A growing trend in the healthcare community, data federation will simultaneously protect and allow access to the data critical for biomedical and healthcare innovation and will lead to the discovery of new treatments for cancer, cardiovascular conditions, rare diseases and more.