Research Interests

Data mining and machine learning foundations: model selection, approximate inference, Bayesian nonparametrics, PAC learning, representation learning
Big Data Analytics: big graph analytics, safe data science
Biomedical and Health Informatics: computational chemical biology, pharmacogenomics, mobile healthd

Research Overview

My research interest is to study and apply computational and theoretical principles for accelerating knowledge discovery from data and for enabling actions on important societal problems. We work extensively on predictive analytics, aiming to generate actionable and testable hypotheses based on data and previous experience. The recent algorithmic work from my group could be found on multi-view learning, transfer learning, multi-task learning, boosting with structural sparsity, and learning with big graph data.

The current application focus in my group is in the translational science. We aim to advance data analytics in the better understanding of the connections between biological systems, disease physiology, intervention, and therapeutics, and to evaluate the clinical and social impacts of the understanding at multiple levels.

Though the problem set is diverse, the common threads of our work are geometric and probabilistic representations of data, effective feature generation, multimodal data integration, sparse model selection and averaging, and learning generative and discriminative models on (Riemannian) manifolds. Much of our work addresses three core problems in machine learning and data mining: stable pattern identification with structured input and output, information fusion with multiple data sources, and system support for big data analytics.



FFSM : Fast Frequent Subgraph Mining source code is available through sourceforge with the GNU General Public License (GPL). Here is the Romanian translation of the description of the software. Thanks Aleksandra Seremina and courtesy of azoft.
Joint Space PCA : JSPCA is the software that we develop in anomaly detecion and localization with wireless sensor network data. The algorithm is described in [Jiang et al. KDD'11].

Data Sets

KUChemBio : is a data repository for computational chemical biology.
Anomaly detection in wireless sensor networks : is a data set that we collected for anomaly detection in wireless sensor networks.
Big Data Analysis of Swimming Athletes' Performance Records : is a database that Dr. Luo Bo and I constructed for athlete performance profiling with related analysis.

Tutorials & Short Courses


Data Science Seminar is a weekly event in ITTC @ KU for people to discuss the latest progresses in broadly defined data driven sciences, including big data analysis, machine learning, data mining and database, bioinformatics, information retrieval, vision and image analysis, high performance computing, and computer system research towards large data set processing. You are welcome to join our discussions!


  • Miller Scholar, University of Kansas School of Engineeing, 2014
  • Best Student Paper Award, IEEE International Conference on Data Mining (ICDM), Vancouver, Canada, December 2011, with the Ph.D. student Hongliang Fei
  • Bellows Scholar, University of Kansas School of Engineering, 2011
  • Miller Scholar, University of Kansas School of Engineering, 2010
  • Best Paper Award Runner-Up, the 18th ACM Conference on Information and Knowledge Management (CIKM'09), Hong Kong, China, December 2009
  • National Science Foundation CAREER Award (IIS 0845951, 2009 - 2014, Project Website )