Introduction to Computational Molecular Biology: Molecular Evolution
GENOME 541
Department of Genome Sciences
University of Washington
Spring Quarter, 2009
Course description:
This is the second quarter of a two-quarter introduction to protein and DNA sequence analysis and molecular evolution, including probabilistic models of sequences and of sequence evolution, computational gene identification, pairwise sequence comparison and alignment (algorithms and statistical issues), multiple sequence alignment and evolutionary tree construction, comparative genomics, and protein sequence/structure relationships. These are the central computational methods required to determine the "periodic table of biology," i.e., the list of proteins and their evolutionary relationships, which can be regarded as the first stage in the growth of molecular biology into a quantitative science. Moreover, the statistical and algorithmic methods used (which include maximum likelihood estimation, hidden Markov models and dynamic programming) have wide applicability in other areas of computational and mathematical biology.
Instructional staff
Instructor: Joe Felsenstein
Email: joe@gs.washington.edu
Office: Foege S420B
Instructor: William Stafford Noble
Email: noble@gs.washington.edu
Office: Foege S220B
Instructor: Larry Ruzzo
Email: ruzzo@cs.washington.edu
Office: Paul Allen Center 554
Instructor: Martin Tompa
Email: tompa@cs.washington.edu
Office: Paul Allen Center 538
Instructor: Phil Bradley
Email: pbradley@fhcrc.orgMeeting times and locations
Tuesday and Thursday, 10:30 - 11:50 am, Foege Building S110.
Prerequisites
GENOME 540 or permission of instructor.
Students must be able to write computer programs for data analysis. Some prior exposure to probability, statistics and molecular biology is highly desirable.
Course materials
Required: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, S. Eddy, A. Krogh, G. Mitchison; Cambridge University Press, 1998. ISBN: 0521629713.
Required: Statistical Methods in Bioinformatics : An Introduction (Statistics for Biology and Health) by Warren J. Ewens, Gregory R. Grant; Springer, 2005. ISBN: 0387400826.
Course requirements
- The entire course grade is based on the homework assignments, which are due weekly (more or less). No tests or exams.
- The homework assignments involve writing programs for data analysis, and running them on a computer that you have access to (we cannot provide computers). We don't require a specific language, since it is not practical to grade your code, just the output from running your programs.
- Homework is due by 11:59 pm on the indicated date. After that it will be accepted, but penalized. Specifically, each assignment is worth 100 points, from which 10 points will be deducted for each day (or fraction thereof) that you turn it in late. The maximum deduction for being late is 60 points (even if you are more than 6 days late). If you get less than 40 points on an assignment, you are allowed to redo it and take the new score (which will be 40, i.e. 100 - 60, if there are no mistakes).
- It is OK to run your program on someone else's input data file, and compare outputs to see if you get the same results. However it is not OK to share programs, or to get someone else to debug your program. A key part of the course is being able to write and debug your own programs for data analysis.
Examinations
None.
Course grade
10% for each homework assignment.
Home page
The course home page can be found at http://noble.gs.washington.edu/~noble/genome541 .
Class schedule
Date Instructor Topic Reading Homework Tue Mar 31 Felsenstein Trees, parsimony, compatibility Ewens 497-499, 511-512, 517-521; Durbin 160-163, 173-176, 188-189 Thu Apr 2 Felsenstein Counting trees, searching tree space Ewens 511-512; Durbin 163-165, 176-179 HW1 (tree) Tue Apr 7 Felsenstein Distances and distance matrix methods Ewens 499-511; Durbin 165-173, 189-191 Thu Apr 9 Class cancelled due to illness Tue Apr 14 Felsenstein Models of DNA and protein change Ewens 475-496; Durbin 193-197 HW2 (primates.dna) Thu Apr 16 Felsenstein Likelihood and Bayesian methods Ewens 512-516, 409-416; Durbin. 197-210, 215-217 HW3 Tue Apr 21 Felsenstein Testing and bootstraps Ewens 295-300, 308-309, 313-318, 522-535; Durbin 179-180, 212-215 Tue Apr 23 Felsenstein Coalescents Durbin 211-212 HW4 Tue Apr 28 Felsenstein Inference with coalescents Ewens 392-398; Durbin 206-207, 211-212 Thu Apr 30 Class re-scheduled due to Genome Sciences Symposium Fri May 1 Noble Microarray analysis (Pavlidis 2003)
(Storey and Tibshirani 2003)
(Brown et al. 2000)
(Ramaswamy et al. 2001)Tue May 5 Noble Predicting protein function from heterogeneous data (Lee et al. 2004)
(Troyanskaya et al. 2003)
(Lanckriet et al. 2004)
(Noble 2006 or long version)HW5 Thu May 7 Noble Protein identification from tandem mass spectra (Sadygov et al. 2004) Fri May 8 Noble Motif discovery HW6 Tue May 12 Ruzzo Modeling and searching for non-coding RNA Durbin, ch. 10 Thu May 14 Ruzzo Modeling and searching for non-coding RNA HW7 Tue May 19 Tompa Comparative sequence analysis and phylogenetic footprinting (Xie et al.)
(Bejerano et al.)
(Siepel et al.)
(Blanchette and Tompa)
(Neph and Tompa)
(Prakash and Tompa)Thu May 21 Tompa Comparative sequence analysis and phylogenetic footprinting HW8 Tue May 26 Bradley Computational structural biology Thu May 28 Bradley Computational structural biology HW9 Tue Jun 2 Bradley Computational structural biology Thu Jun 4 Bradley Computational structural biology HW10