While only recently developed, the ability to profile expression data in single cells (scRNA-Seq) has already led to several important studies and findings. However, this technology has also raised several new computational challenges. These include questions about the best methods for clustering scRNA-Seq data, how to identify unique group of cells in such experiments, and how to determine the state or function of specific cells based on their expression profile. To address these issues we develop and test a method based on neural networks (NN) for the analysis and retrieval of single cell RNA-Seq data. We tested various NN architectures, some of which incorporate prior biological knowledge, and used these to obtain a reduced dimension representation of the single cell expression data. We show that the NN method improves upon prior methods in both, the ability to correctly group cells in experiments not used in the training and the ability to correctly infer cell type or state by querying a database of tens of thousands of single cell profiles. Such database queries (which can be performed using our web server) will enable researchers to better characterize cells when analyzing heterogeneous scRNA-Seq samples. Read more about it here.
A stimulant (like a virus) to a cell can interact with proteins in the cell and cause a cascade of signaling pathways that activate/repress the expression of genes downstream. The differential expression of such genes can cause further signaling cascades that cause the differential expression of another set of genes. While the gene expression at different time points can be observed, it is not possible to directly observe the signaling pathways responsible for their differential expression. We developed a tool, TimePath, to learn the progression of such signaling pathways across time based on time series gene expression data and protein-protein interaction data. Read more about it here. We applied the tool to the analysis of HIV-1 dementia which you can read about here.
We developed a tool, MT-SDREM, to jointly learn signaling pathways and regulatory networks using time series gene expression data from multiple related conditions, TF-gene interaction data, and a protein-protein signaling network. Our tool built on an existing tool called SDREM which learnt the signaling pathways and regulatory network for just a single condition. For joint learning, we shared information the following information :- (1) we ensured that the condition-specific networks learnt were consistent – i.e. the edge directions were the same for all networks (2) If a TF was predicted to regulate more than one condition, it’s prior for regulating any condition was increased. Validation with respect to RNAi screen hits, GO analysis and manual examination of the predicted signaling proteins demonstrated the advantage of joint inference. Read more about it here.
We programmed a Mixed Integer programming solver from scratch (using existing LP solvers) and experimented with various node selection and branching heuristics with application to the Warehouse Location problem.