Welcome to Data Integration and Knowledge Discovery Lab

The research goal of our lab is to develop novel artificial intelligence (AI), machine learning, and statistical algorithms to better understand gene and protein function and regulation. We aim to uncover the molecular mechanisms driving complex biological processes and to predict how organisms respond at the genetic level to environmental changes and disease conditions. By integrating AI with large-scale, multi-omics data, we seek to construct interpretable models that reveal key pathways and regulatory networks involved in both normal physiology and disease states. Our research has broad applications in understanding the genetic basis of diseases such as cancer, neurodegenerative disorders, and immune-related conditions. Our ongoing work is centered around the following areas:

  • Machine Learning Algorithms and AI: Machine learning (ML) and AI are transforming the landscape of biological research by enabling the analysis of complex, high-dimensional data. These approaches are particularly powerful in uncovering patterns, predicting outcomes, and generating testable hypotheses from large-scale omics datasets and biological networks. Our research leverages a wide range of ML techniques, including supervised learning, unsupervised clustering, dimensionality reduction, and deep learning, to model intricate biological systems. We apply these methods to tasks such as gene function prediction, disease classification, regulatory network inference, and biomarker discovery. By integrating AI-driven models with domain knowledge, we aim to enhance the interpretability and biological relevance of computational predictions, ultimately advancing our understanding of health and disease.

  • Computational Systems Biology and Genetic Diseases: Genes cooperate to carry out specific functions within a cell, and cells in turn work together to form tissues and organs. Understanding the underlying molecular mechanisms of a cell requires insight into the interplay between genes, proteins, and small molecules. Our work focuses on developing methods to integrate diverse types of biological data in order to elucidate biological pathways and networks in complex systems, both under normal and disease conditions.

  • Coding and Non-coding Gene Regulation: Gene transcriptional regulation refers to the processes by which a cell controls the expression of its genes. Precisely regulated spatial and temporal gene expression is essential for the accurate execution of key biological processes, including development, proliferation, apoptosis, aging, and differentiation. Our research focuses on developing computational methods to predict gene regulatory interactions by integrating large-scale omics datasets.
  • Epigenetics: For decades, we have realized that besides the DNA/genome, there is a second genome named epigenome that can also contribute to the gene regulation. Epigenome is characterized by a cell’s overall chromatin state, which is defined by chemical modifications such as histone modifications and DNA methylation that can change upon intrinsic/environmental signals. We develop computational methods to integrate advanced high-throughput sequencing data to understand genome-epigenome interaction and epigenetic gene regulation.