Introduction to Computational Molecular Biology: Molecular Evolution
GENOME 541
Department of Genome Sciences
University of Washington
Spring Quarter, 2008
Course description:
This is the second quarter of a two-quarter introduction to protein and DNA sequence analysis and molecular evolution, including probabilistic models of sequences and of sequence evolution, computational gene identification, pairwise sequence comparison and alignment (algorithms and statistical issues), multiple sequence alignment and evolutionary tree construction, comparative genomics, and protein sequence/structure relationships. These are the central computational methods required to determine the "periodic table of biology," i.e., the list of proteins and their evolutionary relationships, which can be regarded as the first stage in the growth of molecular biology into a quantitative science. Moreover, the statistical and algorithmic methods used (which include maximum likelihood estimation, hidden Markov models and dynamic programming) have wide applicability in other areas of computational and mathematical biology.
Instructional staff
Instructor: Joe Felsenstein
Email:
Office: Foege S420B
Instructor: Larry Ruzzo
Email:
Office: Paul Allen Center 554
Instructor: David Baker
Email:
Office: Health Sciences J556
Instructor: William Stafford Noble
Email:
Office: Foege S220B
Instructor: Bruce Weir
Email: bsweir@u.washington.edu
Office: Health Sciences F665
Meeting times and locations
Tuesday and Thursday, 10:30 - 11:50 am, Foege Building S110.
Prerequisites
GENOME 540 or permission of instructor.
Students must be able to write computer programs for data analysis. Some prior exposure to probability, statistics and molecular biology is highly desirable.
Course materials
Required: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, S. Eddy, A. Krogh, G. Mitchison; Cambridge University Press, 1998. ISBN: 0521629713. Paperback, ~$35.
Required: Statistical Methods in Bioinformatics : An Introduction (Statistics for Biology and Health) by Warren J. Ewens, Gregory R. Grant; Springer, 2005. ISBN: 0387400826. Hardbound, ~$90. Note that this is the (new) second edition of the text used in previous years. Make sure you get this edition.
Course requirements
- The entire course grade is based on the homework assignments, which are due weekly (more or less). No tests or exams.
- The homework assignments involve writing programs for data analysis, and running them on a computer that you have access to (we cannot provide computers). We don't require a specific language, since it is not practical to grade your code, just the output from running your programs.
- Homework is due by 11:59 pm on the indicated date. After that it will be accepted, but penalized. Specifically, each assignment is worth 100 points, from which 10 points will be deducted for each day (or fraction thereof) that you turn it in late. The maximum deduction for being late is 60 points (even if you are more than 6 days late). If you get less than 40 points on an assignment, you are allowed to redo it and take the new score (which will be 40, i.e. 100 - 60, if there are no mistakes).
- It is OK to run your program on someone else's input data file, and compare outputs to see if you get the same results. However it is not OK to share programs, or to get someone else to debug your program. A key part of the course is being able to write and debug your own programs for data analysis.
Examinations
None.
Course grade
12.5% for each homework assignment.
Home page
The course home page can be found at http://noble.gs.washington.edu/~noble/genome541 .
Class schedule
Date Instructor Topic Reading Homework Tue Apr 1 Felsenstein Trees, parsimony, compatibility Ewens 497-499, 511-512, 517-521; Durbin 160-163, 173-176, 188-189 Thu Apr 3 Felsenstein Counting trees, searching tree space Ewens 511-512; Durbin 163-165, 176-179 HW1 Tue Apr 8 Felsenstein Distances and distance matrix methods Ewens 499-511; Durbin 165-173, 189-191 Thu Apr 10 Felsenstein Models of DNA and protein change Ewens 475-496; Durbin 193-197 HW2 (data) Tue Apr 15 Felsenstein Likelihood and Bayesian methods Ewens 512-516, 409-416; Durbin. 197-210, 215-217 Thu Apr 17 Felsenstein Testing and bootstraps Ewens 295-300, 308-309, 313-318, 522-535; Durbin 179-180, 212-215 HW3 Tue Apr 22 Felsenstein Coalescents Durbin 211-212 Tue Apr 24 Felsenstein Inference with coalescents Ewens 392-398; Durbin 206-207, 211-212 HW4 Tue Apr 29 Ruzzo Modeling and searching for non-coding RNA Durbin, ch. 10 Thu May 1 Ruzzo Modeling and searching for non-coding RNA HW5 Tue May 6 Baker Computational structural biology Section on protein structure from any biochemistry textbook Thu May 8 Baker Computational structural biology Tue May 13 Noble Microarray analysis (Pavlidis 2003)
(Storey and Tibshirani 2003)
(Brown et al. 2000)
(Ramaswamy et al. 2001)Thu May 15 Noble Predicting protein function from heterogeneous data (Lee et al. 2004)
(Troyanskaya et al. 2003)
(Lanckriet et al. 2004)
(Noble 2006 or long version)HW6 Tue May 20 Noble Protein identification from tandem mass spectra (Sadygov et al. 2004) Thu May 22 Noble Motif discovery HW7 Tue May 27 Weir Thu May 29 Weir HW8 Tue Jun 3 Weir Thu Jun 5 Weir