How to save our data and why the future of the life sciences depends on it

18 May 2017

Advances in life science research provide critical knowledge across a wide range of biological fields, informing everything from our fundamental understanding of evolutionary biology to remarkable progress in personalised medicine.

These advances not only generate vast amounts of molecular data, but often rely heavily on existing databases that have been developed and carefully curated over decades. From nucleic acid sequence databases to community data resources on model organisms, these resources number in the thousands. While many are highly specialised to specific fields, a subset of perhaps 25 or 30 molecular data resources represents a core of knowledge critical to advancement across life science research. These core data resources also play a vital role in the reproducibility and integrity of that research.

They are more than just archives. They also provide expert curation, integration and analysis of vast amounts of molecular data. Many have developed interfaces that enable researchers to easily deposit and download data freely and without restrictions. Moreover, some also facilitate access to biological resources, like mutant strains or antibodies.

A risky situation

Significant loss of these resources, or the introduction of barriers to access, could delay or shut down numerous research programs around the world. And yet many of these core databases face a precarious funding situation that jeopardises its continuation. Regrettably, we have already seen situations where a core, globally used molecular data resource has been on the verge of shutting down due to loss or delay of funding.

Stable and equitable funding is vital to cover the numerous costs involved in maintaining high-quality access to molecular data. Skilled curators ensure quality control and appropriate structuring of the databases, and develop interfaces that are intuitive to use while enabling complex data to be shared easily and rapidly. Computing capabilities are needed, as is adequate storage, security, regional mirrors and internet connectivity. Moreover, all of this must keep pace with a volume of molecular data that doubles every 18 months.

Unreliable support

Currently, core data resources are supported by hundreds of millions of dollars from national, international and non-profit funding agencies, but this is often in the form of short-term or ad hoc research grants. Some databases are funded by a single source, while others require multiple funding sources. Both come avenues come with a suite of vulnerabilities, including changing policies or priorities by funding agencies.

Frequently, the funding structure for each resource reflects its history. Many databases may have started out as a research project, but although they have grown into data infrastructures that underpin the future of global bioscience research, they must often continue to compete with single project proposals within individual countries. Frequently, such grants must be continuously resubmitted to keep a major, internationally valuable database afloat. This is unsustainable and places a great deal of global scientific research at risk. Without ongoing funding, years of research and unimaginable amounts of data could be lost.

A collaborative solution

In order to ensure the security and global accessibility of molecular data, a concerted effort must be toward shared responsibility for the support and maintenance of core data resources. The development of a coordinated international strategy that enables sustainable continuation of these resources is critical for the future excellence, impact and progress of the life sciences.

To this end, my international colleagues and I formed a working group and propose the creation of a Global Life Sciences Data Resources Coalition. As outlined in our recent Commentary in Nature, this Coalition would ideally include representatives of major life science research funders from the countries active in life science research, with a view to establishing a new funding model tailored to the support of high-volume international data infrastructure. In this way, we would endeavour to establish a fair and reasonable system wherein international contributions could sustainably support long-term, free global access to vitally important data resources.

The Coalition would also define eligibility for international support by developing a broad set of well-defined and transparent indicators to estimate the impact, costs and benefits of each molecular data resource. Such criteria would align with the FAIR principles to make data Findable, Accessible, Interoperable and Reusable.

With the support of such a global coalition of researchers and funding bodies, we could develop and maintain resources that encourage collaboration and integrity, foster open science, facilitate scientific progress, and deliver a strong return to society on public investments. The future of both Australian and international science depends on it.

Contact: Professor Mark Ragan, Co-Division Head & Group Leader, Genomics of Development and Disease Division, UQ IMB