Welcome to Data Integration and Knowledge Discovery Lab

The research goal of our lab is to create novel machine learning and statistical algorithms to understand gene/protein function and regulation, to understand complex biological processes and how organisms will respond at the genetic level to changes in their external environment. Our research and development center around the following areas: 

  • Machine Learning Algorithms: Rapidly advancing biotechnology is generating unprecedented large-scale high-throughput data. To efficiently utilize these data to address important biological problems, we develop statistical and computational methods for data integration, pattern mining and knowledge discovery from massive genomic datasets and biological networks. 

  • Gene Regulation and Non-coding RNAs: Gene transcriptional regulation refers to any process by which a cell regulates genes’ expression. Properly regulated spatial and temporal expression of genes is crucial to precise execution of biological processes such as development, proliferation, apoptosis, aging, and differentiation. One type of gene regulatory proteins is transcription factors (TFs). A TF can interact with DNA at specific binding sites to regulate one of its target genes. By binding with these binding sites, TFs cooperatively interact with their cofactors, RNA polymerase and the chromatin to regulate gene expression. We develop methods to computationally predict TF binding interactions through binding site identification at the whole genome scale. The discovery of various types of non-coding RNAs have been shown important to gene regulation. We also contribute to the method development for non-coding RNA identification and function annotation.
  • Epigenetics: For decades, we have realized that besides the DNA/genome, there is a second genome named epigenome that can also contribute to the gene regulation. Epigenome is characterized by a cell’s overall chromatin state, which is defined by chemical modifications such as histone modifications and DNA methylation that can change upon intrinsic/environmental signals. We develop computational methods to integrate advanced high-throughput sequencing data to understand genome-epigenome interaction and epigenetic gene regulation.

  • Computational Systems Biology: all components in a cell work together to perform specific functions. Understanding of underlying molecular mechanisms of a cell requires understanding the interplay between different genes, proteins and small molecules. We develop methods to integrate different types of biological data to understand biological pathways and networks in complex systems under specific normal or disorder conditions.