Building tools to predict how the genome controls cells

This project focuses on development of disease-agnostic, high throughput and scalable functional genomics methods that integrate computational predictions and disease modelling to study mechanisms controlling cell differentiation and genetic causes of disease.

Genome sequencing is a powerful tool for studying the biological basis of disease, yet out of millions of data points, finding the underlying cause of disease can be difficult. Current protocols for classifying variants from patient DNA data largely rely on prior knowledge about normal and abnormal gene variation contained in large public databases, known disease-causing gene panels, or identifying variants causing amino acid changes in proteins (which only comprise 2% of the genome). Despite these powerful approaches, studies indicate that classifying variants as pathogenic occurs in only a minority of cases and among variants reported in ClinVar, a public archive of relationships between human variation and phenotype, wherein a large proportion (37%) are classified as variants of unknown significance (VUS). New approaches are needed to improve variant prioritisation and classification from genetic data.

My research group is developing unsupervised, genome-wide computational analysis methods to reveal genetic mechanisms of development and disease. For example, our recent work developed TRIAGE which uses epigenetic modification of DNA-binding histone proteins to identify regions of the genome that are critical determinants of cell decisions and functions. Using data from >800 cell types, we identified genomic “hot-spots” that, when mutated, are associated with diseases, including neurological and cardiovascular diseases, multi-organ syndromes, and cancer. Our data show that TRIAGE regions of the genome are enriched for pathological variants (especially those causing congenital diseases), intolerant to mutations, have significantly increased effects on complex trait phenotypes, and encode genes that are key determinants of cell differentiation and morphogenesis.

This area of my program focuses four design criteria in developing and implementing computational tools to facilitate novel discovery in cells.

Simplicity: We are building methods that help organise genomic information in an unsupervised manner across the human genome. These methods can be used to analyse orthogonal data (e.g. patient genetic data) to identify genetic causes of disease or development and/or reveal relationships between gene groups that inform programs controlling cell decisions and functions.
Versatility: We aim to develop methods that can be used with any genomic data that maps to genes or a chromosomal address including analysis of patient genetic data or any genomic data type (GWAS, SNPs, RNAseq etc). Furthermore, these methods are ideal models to weight regions of the genome in genetic analysis tools such as polygenic risk scores or machine learning algorithms.
Disease-agnostic: Using a systems level approach, these methods enable broad implementation in data analysis pipelines for any data sample from any cell, tissue, disease, or individual. They provide methods to enrich for factors most likely to cause disease from any complex genome wide data.
Efficient functional screening: These prediction methods link in with wet lab cell biology to functionally test novel hypotheses derived from computational prediction methods in functional genomics studies.

Relevant lab publications of interest:

1.   Sophie Shen S, Sun Y, Matsumoto M, Sinniah E, Wilson SB, Little MH, Powell JE, Nguyen Q, Palpant NJ. Integrating single-cell genomics pipelines to discover mechanisms of stem cell differentiation. Trends in Molecular Medicine. 2021, 27, 1135-1158.
2.   Shim WJ, Sinniah E, Xu J, Vitrinel B, Alexanian M, Andreoletti G, Shen S, Sun Y, Balderson B, Boix C, Peng G, Jing N, Wang Y, Kellis M, Tam P, Smith A, Piper M, Christiaen L, Nguyen Q, Boden M**, Palpant NJ**. Conserved epigenetic regulatory logic infers genes governing cell identity. Cell Systems. 11, 625-639 e613 (2020).
3.   Thompson M, Matsumoto M, Ma T, Senabouth A, Palpant NJ, Powell JE, Nguyen Q. scGPS: Determining Cell States and Global Fate Potential of Subpopulations. Frontiers in Genetics. 2021 Jul 19;12:666771.
4.   Xu, J, Falconer, C, Nguyen, Q, Crawford, J, McKinnon, BD., Mortlock, S, Senabouth, A, Andersen, S, Chiu, HS, Jiang, L, Palpant, NJ, Yang, J, Mueller, MD., Hewitt, AW., Pébay, A, Montgomery, GW., Powell, JE. and Coin, L. Genotype-free demultiplexing of pooled single-cell RNA-seq. Genome Biology 2019 20 (1) 290.