Noble Research Lab

Department of Genome Sciences
University of Washington

Our research group develops and applies computational techniques for modeling and understanding biological processes at the molecular level. Our research emphasizes the application of statistical and machine learning techniques, such as hidden Markov models and support vector machines. We apply these techniques to various types of biological data, including DNA and protein sequence data, as well as gene expression data from microarray experiments. We are currently developing methods for analyzing shotgun proteomics data, for characterizing protein function, structure and interactions, and for understanding the structure and regulatory influence of chromatin.


Front row: Jian Qiu, Bob Thurman, Charles Grant, Sheila Reynolds, Bill Pentney, Oliver Serang. Middle row: Merja Oja, Barbara Frewen. Back row: Lukas Kaell, Nutti, Alex Brown, Mirela Andronescu, Bill Noble, Aaron Klammer.

Click here for older pictures.

Lab members

  • William Stafford Noble, Associate Professor, Genome Sciences
  • Zafer Aydin, Postdoctoral fellow, Genome Sciences
     

  • Michael Hoffman, Postdoctoral fellow, Genome Sciences
     

  • Merja Oja, Postdoctoral fellow, Genome Sciences
     

  • Oliver Serang, Ph.D. student, Genome Sciences
    Imagine you built some cool things with Legos and then a friend took them apart, leaving small chunks together. Then later you're trying to remember what you built. You browse through the Lego manual and find several pieces in chunks together that should only be there if you built the X-Wing fighter and a chunk that could have come from Robinhood's castle or from the X-Wing, but you don't find any other pieces that could have come from Robinhood's castle. And so you guess that you built the X-wing. This is essentially how mass spectrometry-based proteomics works.
    I'm making algorithms that intelligently decide what proteins are in a sample by looking at pieces of these proteins and putting them back together.

  • Sheila Reynolds, graduate student, Department of Electrial Engineering
  • Xiaoyu Chen, graduate student, Department of Computer Science and Engineering

  • Charles Grant, senior programmer, Genome Sciences
    I work on MEME and Meta-MEME, software packages for discovering and searching for motifs. Motifs are short, distinctive sequences of protein or DNA which play critical roles in protein structure and the regulation of DNA transcription.

Publications

Software

All of the software listed below is available with source code at the URLs specified. When indicated, some of the software is augmented with a free web server. Dates indicate release dates of the software, with multiple years indicating multiple released versions.

  1. Meta-MEME is a motif-based hidden Markov model toolkit for modeling DNA and protein sequences. The Meta-MEME tools have recently been incorporated into the MEME Suite. 1998--2008.
  2. Family Pairwise Search is a protein homology detection algorithm that combines sequence similarity scores from a pairwise alignment algorithm such as Smith-Waterman or BLAST. Source code and a web server are available. 1999-2000.
  3. Gist implements the support vector machine learning algorithm for classification, as well as kernel principal components analysis. A web server based upon Gist is available. 1999-2006.
  4. matrix2png is a visualization tool for the display of matrix data. It is available for download or interactive web use. 2002-2006.
  5. Prism is a web interface to matrix2png that includes features specifically for visualizing microarray data. 2003.
  6. Rankprop uses diffusion across a network of protein similarities to identify remote homology relationships. Source code and a web server for searching the non-redundant protein database are available. 2004--2008.
  7. SVM-fold makes predictions of superfamily and fold level classifications of proteins based on the Structural Classification of Proteins hierarchy using the support vector machine learning algorithm. A web server is available. 2004--2007.
  8. ChargeCzar uses a support vector machine to discriminate between +2- and +3-charged tandem mass spectra, with the goal of reducing database search time by eliminating the need to search twice with each spectrum. 2005.
  9. BiblioSpec enables the identification of peptides from tandem mass spectra by searching against a database of previously identified spectra. 2006.
  10. HyFi identifies primer and microarray probe binding sites in genomic DNA. 2006.
  11. Percolator post-processes the results of a shotgun proteomics database search program, re-ranking peptide-spectrum matches so that the top of the list is enriched for correct matches. 2007-2008.
  12. HMMSeg performs wavelet smoothing and unsupervised HMM segmentation on genomic data sets. 2007.
  13. svmvia implements the full regularization path optimization algorithm for training a support vector machine. 2007.
  14. Ishtar designs PCR primers that target multiple loci. 2007.
  15. Pythia designs PCR primers from a thermodynamic point of view. 2007.
  16. Crux analyzes shotgun proteomics tandem mass spectra, associating peptides with observed spectra. 2008.
  17. qvality performs nonparametric estimation of posterior error probabilities. 2008.

Former lab members

  • Asa Ben-Hur, Assistant Professor, Department of Computer Science, Colorado State University, Fort Collins
  • Eleazar Eskin, Assistant Professor, Department of Computer Science, Department of Human Genetics, University of California, Los Angeles
  • Lukas Käll, Researcher, Center for Biomembrane Research, Department of Biochemistry \& Biophysics, Stockholm University.
  • Li Liao, Assistant Professor, Department of Computer and Information Sciences, University of Delaware
  • Paul Pavlidis, Assistant Professor of Psychiatry, University of British Columbia

Hike to Heather Lake, May 2006

A party, July 2006

Annual picnic, August 2006

Hike to Lake 22, June 2007

Annual picnic, August 2007

Hike to Wallace Falls, May 2008

The lab is located in Foege, room S220.