CAREER: Mining Genome-wide Chemical-Structure Activity Relationships in Emergent
Chemical Genomics Databases
Project Award Date: 07-01-2009
ITTC will develop an integrated research and education program for advancing the underlying theoretical and computational principles of data mining in the emergent chemical genomics databases. The core technical innovations are advances in (i) developing effective kernel-based representations and structure pattern extraction and selection methods to capture the intrinsic characteristics of irregular and discrete spaces such as the chemical space, (ii) designing methods for adaptive and scalable similarity search in large databases of complex data and methods for accurate classification model construction with imbalanced and out-of-domain data, and (iii) deriving application oriented validation.
A key strength of this work is the application of the theoretic and computational advancements to real-world problems, namely, chemical toxicity prediction based on microarray gene expression profiles and high-throughput chemical screening. By developing innovative tools for graphs and geometric structures, ITTC will enable much better techniques for searching, mining, and analyzing domains of complex data. The timely effort integrates and advances knowledge in three communities: cheminformatics, data mining, and machine learning.
Faculty Investigator(s): Jun Huan (PI)
Student Investigator(s): Brian Quanz, Jintao "Leo" Zhang, Fengmei Wu, Yao He, Meenakshi Mishra, Abenezer Letta, Peter Adany, Yi Jia, Hongliang Fei, Avindra Fernando, , , Lauren Beesley, , Peng Hao, , Parker Roth, Chao Lan, Kaige Yan, Vanessa Tran, Kendal Harland
Primary Sponsor(s): National Science Foundation