Background
Tumor tissues represent component mixtures of different cell types including cancer cells and cells that are part of tumor microenvironment such as immune cells and stromal cells. A number of sequencing initiatives have produced expression data for hundreds of tumor samples, which, for each gene, represent an average expression across different cell types. Comparative analyses of such samples, such as differential expression analysis, are likely to be confounded by different component compositions and proportions. Although the recently developed single-cell sequencing techniques aim to solve these challenges by simultaneously measuring the expression of genes in thousands of individual cells, it is still not applicable to large cohort studies. This is mainly due to factors such as high costs and complex laboratory handling. Therefore, computational gene expression "deconvolution" methods have been developed to infer cell type proportions, and also impute cell type-specific expression profiles from bulk expression data of heterogeneous tissues. There is a series of bulk deconvolution methods developed, i.e., methods based on linear least squares regressions, non-negative matrix factorization and others. Even though these methods have already been successfully applied to impute cell type-specific expression profiles from bulk RNA-seq expression data, their potential to estimate cell type-specific gene regulatory networks from those modeled using bulk transcriptomic data have not been tested.
Aim of the Master project
In this project, the student will benchmark already existing tools such as scType [1], DeMixT [2], CIBERSORTx [3]) to deconvolute gene-regulatory networks built from bulk RNA-seq data collected from prostate cancer tumors (from consortia (TCGA/ICGC and
PCAWG)) into cell type-specific gene regulatory networks. In short, we will use gene-regulatory networks computed using prostate cancer single-cell RNA-seq [4](through SCORPION [5], a tool developed in our lab) from which the cell-type specific network signatures will be obtained. Such cell-type specific signatures can be defined after comparing regulatory networks from different cell types. To validate our approach, we plan to create several pseudo-bulk mixtures from scRNA-seq dataset (with known cell type proportions) and produce gene-regulatory networks from these pseudo-bulks. This in combination with identified signatures will be used as an input for DeMixT, scType, and CIBERSORTx to estimate cell type-specific gene regulatory networks from the pseudo-bulk expression profiles generated before. By comparing the network's edge weights between the cell type-specific gene regulatory networks constructed using single-cell RNA-seq and bulk RNA-seq, we will evaluate the performance of this approach. If successful, this approach will be applied to an assembled cohort of patients for which bulk RNA-seq data is available from benign prostate tissue, primary prostate cancer tissue and castrate resistant prostate cancer tissue (locally recurrent disease post treatment). This will help to characterize the diversity of gene regulation mechanisms associated with prostate cancer initiation, and progression to castrate resistant disease.
Host environment
The candidate will be co-supervised by Drs. Alfonso Urbanucci and Marieke Kuijjer along with Dr. Tatiana Belova, Researcher in the Kuijjer group and collaborators Dr. Daniel Osorio (Postdoctoral Fellow at The University of Texas Austin and former Kuijjer group member) and Matti Nykter (Professor at the University of Tampere). Dr. Alfonso Urbanucci is Project Group Leader at Oslo University Hospital and the Institute for Cancer Research, Norwegian Radium Hospital. Dr. Urbanucci’s work focuses on new clinical precision medicine-based approaches to treat cancer, with a main focus in prostate cancer. Dr. Kuijjer is Group Leader at the Center for Molecular Medicine Norway (NCMM), University of Oslo. Dr. Kuijjer's research focuses on developing computational tools to model gene regulatory networks in cancer and complex diseases. Her work has led to the implementation of several computational models to construct and study regulatory networks in cancers.
Prerequisites
We seek a highly motivated individual with programming skills and interest in the development of computational tools dedicated to the analysis of high throughput sequencing data. The selected candidate will be excited about combining life sciences and computation to analyze cancer data. The successful candidate will be collaborative, independent, with strong enthusiasm for research, and should be proficient in at least one of the following programming languages: Python, R, or bash. Being familiar with gene expression regulation in general is an advantage.
References
Ianevsky, A. et al. Nature Communications, 3(1):1246 (2022)
Cao, S. et al. Nature Biotechnology, 10.1038/s41587-022-01342-x (2022)
Steen, C.B. et al., Methods in Molecular Biology, vol 2117 (2020)
Chen, S. et al., Nature Cell Biology 23, 87-98 (2021)
https://github.com/kuijjerlab/SCORPION