Introduction to Statistical and Computational Genomics

GENOME 559
Department of Genome Sciences
University of Washington School of Medicine


Course description

Rudiments of statistical and computational genomics. Emphasis on basic probability and statistics, and an introduction to computer programming. This course is intended to introduce students with non-computer science backgrounds to the major concepts of programming and statistics.

Learning objectives

After taking this course, students will be able to describe and perform basic analysis tasks relating to biological sequence analysis, phylogenetics, pedigree analysis, genetic association studies, population genetics and microarray analysis. Students will be able to demonstrate an understanding of fundamental statistical concepts, such as p-values, t-tests, chi-squared tests and multiple testing correction. Finally, students will be able to write computer programs to perform statistical and bioinformatics analyses.

Instructional staff

Instructor: William Stafford Noble
Email: noble@gs.washington.edu

Instructor: Mary Kuhner
Email: mkkuhner@gs.washington.edu

Instructor: Larry Ruzzo
Email: ruzzo@cs.washington.edu

Meeting times and locations

Tue/Thu 3:30-4:50 pm in Hitchcock 220

The class meets in a computer lab and will involve writing computer programs during class time.

Prerequisites

Substantial background in molecular and cellular biology, genetics, biochemistry or related disciplines.

Course materials

Bioinformatics: Sequence and Genome Analysis by Mount. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2004. Second edition.
Learning Python by Lutz. O'Reilly, 2007. Third edition.

Course requirements

Students will complete eight homework assignments during the course. Assignments will typically involve some written questions and some programming problems.

Examinations

The final exam will be open book, and will cover the entire quarter. The final exam is scheduled for Thursday, March 20, 4:30-6:20 pm in Hitchcock 220.

Course grade

10% for each homework assignment, and 20% for the final exam.

Class schedule

Lecture Instructor Lecture topic Concepts Programming topic Reading Homework
Tue Jan 6 Noble Sequence comparison: Introduction and motivation Substitution matrices, gap penalties Introduction to Python    
Thu Jan 8 Noble Sequence comparison: Dynamic programming Dynamic programming, Needleman-Wunsch Strings Mount: ch. 3; Lutz: ch. 1-4, 7 HW1 assigned
Tue Jan 13 Noble Sequence comparison: More dynamic programming   Numbers, lists and tuples Mount: ch. 6; Lutz: ch. 5, 8  
Thu Jan 15 Noble Sequence comparison: Local alignment Smith-Waterman File I/O, if-then-else Lutz: ch. 9-12 HW1 due
HW2 assigned
Tue Jan 20 Noble Sequence comparison: Significance of similarity scores distribution, p-value, extreme value distribution for loops Mount: ch. 4; Lutz: ch. 13  
Thu Jan 22 Kuhner Phylogeny: Parsimony heuristic search, "assumption-free" methods while loops, modules   HW2 due
Tue Jan 27 Kuhner Phylogeny: Models of the mutational process least squares Dictionaries Lutz: pp. 103-107 HW3 assigned
Thu Jan 29 Kuhner Parsimony: Distance and likelihood maximum likelihood Defining functions Lutz: ch. 11  
Tue Feb 3 Kuhner Phylogeny: Bayesian methods and MCMC Bayes' Theorem, Markov chains Printing and sorting Lutz: pp. 87-90, 455-456 HW3 due
HW4 assigned
Thu Feb 5 Kuhner Phylogeny: Validating phylogenies likelihood ratio test, bootstrap Regular expressions Lutz: pp. 447-450  
Tue Feb 10 Ruzzo Pedigree analysis: Probabilities of genes on pedigrees LOD score Objects and classes (part 1) Lutz: ch. 19-20 HW4 due
HW5 assigned
Thu Feb 12 Ruzzo Pedigree analysis: Modes of inheritance ascertainment bias Objects and classes (part 2) Lutz: ch. 21  
Tue Feb 17 Ruzzo Pedigree analysis: Fitting models to pedigrees nuisance parameters Biopython   HW5 due
HW6 assigned
Thu Feb 19 Ruzzo Association studies: Detecting association between a trait and a gene chi-square, Bonferroni correction, relative risk      
Tue Feb 24 Ruzzo Association studies: Avoiding pitfalls in association studies bias, correlation vs. causation      
Thu Feb 26 Ruzzo Population genetics: Categorical data analysis chi-square and multinomial distributions, testing in population genetics     HW6 due
HW7 assigned
Tue Mar 2 Ruzzo Population genetics: Genetic Diversity Statistical approaches for quantifying DNA sequence variation; Expectation of a random variable      
Tue Mar 4 Ruzzo Whole genome association Linear regression     HW7 due
HW8 assigned
Tue Mar 9 Ruzzo Whole genome association t-test, p-value      
Thu Mar 11 Ruzzo Whole genome association family-wise error rate, false discovery rate     HW8 due