My research focuses on the development and application of machine learning and statistical methods for interpreting complex biological data sets. In selecting research areas to focus on, I am drawn to research problems in which I can solve fundamental problems in biology while also pushing the state of the art in machine learning. Currently, my research can be roughly divided into three areas, as follows:
- Predicting protein properties. My lab has done extensive work using methods such as hidden Markov models, dynamic Bayesian networks and support vector machine classifiers to identify remote protein homologs, assign gene functional annotation to proteins, and to predict protein secondary structure from sequence. We continue to develop novel algorithms for a variety of these and related problems.
- Chromatin and gene regulation. We use motif-based hidden Markov models to characterize collections of transcription factor binding sites in genomic DNA. We also develop models that predict properties of chromatin from genomic DNA.
- Analysis of mass spectrometry data. In collaboration with Michael MacCoss's lab, we have developed a series of machine learning and statistical methods for the analysis of shotgun proteomics data. In this field, we continue to work on protein identification and quantification, targeted proteomics, and biomarker discovery.
Updated September, 2008