Why sharing scientific data is the way of the future

9 January 2017

Researchers from The University of Queensland’s Institute for Molecular Bioscience and Queensland Brain Institute have revealed new insights into how genetic and environmental factors control gene expression in humans. 

Their study, which was published last week (6 Jan) in the American Journal of Human Genetics, examined the full genomic information of 2,765 healthy individuals, allowing the research team to pinpoint the genetic controls and DNA variants controlling thousands of genes.
The significance of this research lies not just in its immediate findings for medical research, but also in the team’s commitment to championing open science.

Fostering a culture of data sharing

The Brisbane-based research team has launched an open access website to share their analysed results – consisting of billions of data points – used in this study, allowing other researchers to explore and build on the team’s findings.
Dr Joseph Powell
Dr Joseph Powell
IMB group leader and senior researcher Dr Joseph Powell said making their data publicly available would open doors for future discoveries and collaborations.
“There is a global push towards making data publicly available and useable for people from many different research backgrounds,” Dr Powell said.
“Currently in scientific publishing, it is common to only share your most significant results, which are often a small fraction of your total results.
“This means if researchers want to explore a particular dataset further, they usually have to recreate it from scratch or request parts of it from the original research team, which can be hit or miss.
“It also means that many results go unshared, and while these results may not be useful to my research, they could provide vital information to people trying to solve other important scientific questions.
“Every day, researchers around the world are making new datasets that are of great value to the medical research community.
“For example, in this study, we collaborated with researchers in Europe and the United States for two years to collate, clean and quality check our raw data collected from almost 3000 individuals as part of the Consortium for the Architecture of Gene Expression (CAGE).
“Now we have a large, high quality resource that we can use to do all kinds of exciting science with—why wouldn’t we want to share that?”

Increasing the impact of publicly-funded research

But first the team needed to find a simple way to share their extensive datasets.
CAGE website
A screen shot of the CAGE website
“Our data was pretty complex and contained billions of data points. The size of it meant that if you wanted to open it in say Excel, you would have to write the code to extract the relevant data first,” Dr Powell said.
“We designed a website with a large database backend and an interactive, user-friendly frontend. We tried to make this as easy and intuitive for users to search, download and scale the data to suit their needs.

“Using the site, you can simply type in the gene or region of the genome you are interested in and it will provide you with small data tables detailing aspects of genetic control of gene expression. These data tables can then be used to produce graphics, or you can download the entire data table if you are programmatically inclined.
“We hope our research colleagues around the world will be able to make the most of this data now and into the future.”
You can access the full dataset from this study at http://cnsgenomics.com/shiny/CAGE.  
Contact: Gemma Ward, IMB Communications, 07 3346 2155, communications@imb.uq.edu.au

Help IMB research

Give now


IMB newsletters