Noble Research Lab

Department of Genome Sciences
University of Washington

Our research group develops and applies computational techniques for modeling and understanding biological processes at the molecular level. Our research emphasizes the application of statistical and machine learning techniques, such as hidden Markov models and support vector machines. We apply these techniques to various types of biological data, including protein and DNA sequences, data from high-throughput genomic assays such as ChIP-seq and Hi-C, and tandem mass spectrometry. We are currently developing methods for analyzing shotgun proteomics data, for characterizing protein function, structure and interactions, and for understanding the structure and regulatory influence of chromatin.

Postdoctoral fellowships are available in machine learning methods, genomics, and proteomics

    Back row: Damon May, Will Fondrie, Jacob Schreiber, Wout Bittremieux, Lindsay Pino, Gurkan Yardimci, Maizie, Elena May, Enzo Bonora, Isla Bonora, Aubrey Bonora, Giancarlo Bonora. Front row: Bill Noble, Kaishu Mason, Yang Lu, Charles Grant.

    Click here for older pictures.

    Lab members

    • William Stafford Noble, Professor, Genome Sciences

    • Wout Bittremieux, Postdoctoral fellow, Genome Sciences

    • Giancarlo Bonora, Postdoctoral fellow, Genome Sciences

    • Will Fondrie, Postdoctoral fellow, Genome Sciences

    • Dejun Lin, Postdoctoral fellow, Genome Sciences

    • Yang Lu, Postdoctoral fellow, Genome Sciences

    • Ritambhara Singh, Postdoctoral fellow, Genome Sciences

    • Gurkan Yardimci, Postdoctoral fellow, Genome Sciences

    • Timothy Durham, Ph.D. student, Department of Genome Sciences

    • Gesine Cauer, Ph.D. student, Department of Genome Sciences

    • Andy Lin, Ph.D. student, Department of Genome Sciences

    • Lindsay Pino, Ph.D. student, Department of Genome Sciences

    • Jacob Schreiber, Ph.D. student, Department of Computer Science and Engineering

    • Deepthi Hegde, Masters Student, Data Science Program

    • Charles Grant, Senior programmer, Department of Genome Sciences

    • Rita Chupalov, Software Engineer, Department of Genome Sciences

    • Kaipo Tamura, Software Engineer, Department of Genome Sciences

    • Frederick Huyan, Undergraduate, School of Computer Science and Engineering

    • Joy Ji, Undergraduate, School of Computer Science and Engineering

    • Rohan Guliani, Undergraduate, School of Computer Science and Engineering

    • Yuchong Xiang, Undergraduate, School of Computer Science and Engineering



    All of the software listed below is available with source code at the URLs specified. When indicated, some of the software is augmented with a free web server. Dates indicate release dates of the software, with multiple years indicating multiple released versions.

    1. Meta-MEME is a motif-based hidden Markov model toolkit for modeling DNA and protein sequences. The Meta-MEME tools have been incorporated into the MEME Suite. 1998-2008.
    2. Family Pairwise Search is a protein homology detection algorithm that combines sequence similarity scores from a pairwise alignment algorithm such as Smith-Waterman or BLAST. Source code and a web server are available. 1999-2000.
    3. Gist implements the support vector machine learning algorithm for classification, as well as kernel principal components analysis. A web server based upon Gist is available. 1999-2006.
    4. matrix2png is a visualization tool for the display of matrix data. It is available for download or interactive web use. 2002-2006.
    5. Prism is a web interface to matrix2png that includes features specifically for visualizing microarray data. 2003.
    6. Rankprop uses diffusion across a network of protein similarities to identify remote homology relationships. Source code and a web server for searching the non-redundant protein database are available. 2004-2008.
    7. SVM-fold makes predictions of superfamily and fold level classifications of proteins based on the Structural Classification of Proteins hierarchy using the support vector machine learning algorithm. A web server is available. 2004-2007.
    8. ChargeCzar uses a support vector machine to discriminate between +2- and +3-charged tandem mass spectra, with the goal of reducing database search time by eliminating the need to search twice with each spectrum. 2005.
    9. BiblioSpec enables the identification of peptides from tandem mass spectra by searching against a database of previously identified spectra. 2006.
    10. HyFi identifies primer and microarray probe binding sites in genomic DNA. 2006.
    11. Percolator post-processes the results of a shotgun proteomics database search program, re-ranking peptide-spectrum matches so that the top of the list is enriched for correct matches. 2007-2008.
    12. HMMSeg performs wavelet smoothing and unsupervised HMM segmentation on genomic data sets. 2007.
    13. svmvia implements the full regularization path optimization algorithm for training a support vector machine. 2007.
    14. Ishtar designs PCR primers that target multiple loci. 2007.
    15. Pythia designs PCR primers from a thermodynamic point of view. 2007.
    16. Philius predicts protein transmembrane topology and signal peptides. 2008.
    17. Crux analyzes shotgun proteomics tandem mass spectra, associating peptides with observed spectra. 2008-2012.
    18. qvality performs nonparametric estimation of posterior error probabilities. 2008.
    19. Genomedata provides efficient storage of multiple tracks of numeric data anchored to a genome. 2010.
    20. Segway performs simultaneous segmentation and clustering of genomic signal data such as those from ChIP-seq and DNase-seq, finding recurring patterns in these data. 2010-2012.
    21. Segtools provides exploratory data analysis on genomic segmentations. 2010-2011.
    22. Fido uses a probability model to rank proteins according to the posterior probability of their presence in a complex mixture, based on evidence derived from a shotgun proteomics experiment. 2010.
    23. Tide is an ultra-fast implementation of the SEQUEST algorithm for identifying fragmentation mass spectra. 2011.
    24. Fit-Hi-C is a tool for assigning statistical confidence estimates to intra-chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C. 2014.
    25. Pastis infers the three-dimensional structure of the genome on the basis of Hi-C data. 2015.
    26. DRIP Toolkit is a tandem mass spectrometry search engine that uses a dynamic Bayesian network model. 2016.

    Former lab members

    • Ferhat Ay, Institute Leadership Assistant Professor of Computational Biology, La Jolla Institute for Allergy and Immunology
    • Zafer Aydin, Assistant Professor, Computer Enginering Department, Abdullah Gul University, Kayseri, Turkey
    • Asa Ben-Hur, Professor, Department of Computer Science, Colorado State University, Fort Collins
    • Xiaoyu Chen, Illumina
    • Eleazar Eskin, Professor, Department of Computer Science, Department of Human Genetics, University of California, Los Angeles
    • Michael Hoffman, Scientist, Princess Margaret Cancer Centre, Toronto, Canada; Assistant Professor, Department of Medical Biophysics, University of Toronto
    • Victoria Haghighi, Associate Professor, Department of Psychiatry, Columbia University.
    • Lukas Käll, Professor, Applied Systems Biology, KTH - Royal Institute of Technology, Sweden
    • Attila Kertesz-Farkas, Assistant Professor, School of Data Analysis and Artificial Intelligence, the Faculty of Informatics, National Research University Higher School of Economics in Moscow, Russian Federation.
    • Aaron Klammer, Pacific Biosciences
    • Darrin Lewis, Postdoctoral fellow, Cold Spring Harbor Laboratory
    • Jie Liu, Assistant Professor, Department of Computational Medicine and Bioinformatics, University of Michigan
    • Li Liao, Associate Professor, Department of Computer and Information Sciences, University of Delaware
    • Max Libbrecht, Assistant Professor, Department of Computing Science, Simon Fraser University
    • Wenxiu Ma, Assistant Professor, Department of Statistics, UC Riverside
    • Damon May, Computational Immunologist, Adaptive Biotechnologies Corporation
    • Tobias Mann, Director of Software Engineering, Bioinformatics, Adaptive Biotechnologies Corporation
    • Sean McIlwain, Assistant Scientist, Department of Biostatistics & Medical Informatics, University of Wisconsin
    • Merja Oja, VTT Technical Research Centre of Finland
    • Paul Pavlidis, Professor of Psychiatry, University of British Columbia
    • Sheila Reynolds, Senior Research Scientist, Institute for Systems Biology
    • Oliver Serang, Assistant Professor, Department of Computer Science, University of Montana
    • Ilan Wapinski, Systems Biology Fellow, Department of Systems Biology, Harvard University
    • Habil Zare, Assistant Professor, Department of Cell Systems & Anatomy, University of Texas Health Science Center at San Antonio


Hike to Heather Lake, May 2006. A party, July 2006. Annual picnic, August 2006. Hike to Lake 22, June 2007. Annual picnic, August 2007. Hike to Wallace Falls, May 2008. Annual picnic, October 2008. Hike to Heather Lake, June 2009. Hike to Gold Creek, June 2010. Annual picnic, August 2010. Goodbye party for Michael Mathews, December 2010. Hike to Boulder River, May 2011. Hike to Talapus Lake, June 2012. Hike to Bridal Veil Falls, July 2013. Goodbye party for Habil Zare, June 2014. Hike to Annette Lake, June 2014. Hike to Snow Lake, July 2015. Hike to Denny Creek, August 2016. Hike to Rattlesnake Ledge, July 2017. Hike to Heather Lake, July 2018.

The lab is located in Foege, room S340.

Terms and Conditions Online Privacy Statement