Welcome to Data Integration and Knowledge Discovery Lab

The research goal of our lab is to understand complex biological processes by data integration and knowledge discovery, to understand how organisms will respond at the genetic level to changes in their external environment, and to predict gene/protein function and regulation. Our research and development center around the following areas: 

  • Computational Systems Biology: all components in a cell work together to perform specific functions. Understanding of underlying molecular mechanisms of a cell requires understanding the interplay between different genes, proteins and small molecules. We develop methods to integrate different types of biological data to understand biological pathways and networks in complex systems under specific normal or disorder conditions.
  • Gene Regulation: Gene transcriptional regulation refers to any process by which a cell regulates genes’ expression. Properly regulated spatial and temporal expression of genes is crucial to precise execution of biological processes such as development, proliferation, apoptosis, aging, and differentiation. One type of gene regulatory proteins is transcription factors (TFs). A TF can interact with DNA at specific binding sites to regulate one of its target genes. By binding with these binding sites, TFs cooperatively interact with their cofactors, RNA polymerase and the chromatin to regulate gene expression. We develop methods to computationally predict TF binding interactions through binding site identification at the whole genome scale.

  • Epigenetics: For decades, we have realized that besides the DNA/genome, there is a second genome named epigenome that can also contribute to the gene regulation. Epigenome is characterized by a cell’s overall chromatin state, which is defined by chemical modifications such as histone modifications and DNA methylation that can change upon intrinsic/environmental signals. We develop computational methods to integrate advanced high-throughput sequencing data to understand genome-epigenome interaction and epigenetic gene regulation.

  • Data Mining and Machine Learning Algorithms: Rapidly advancing biotechnology is generating unprecedented large-scale high-throughput data. To efficiently utilize these data to address important biological problems, we develop statistical and computational methods for data integration, pattern mining and knowledge discovery from massive genomic datasets and biological networks.