GENOME 373: Genomic Informatics Homework 3
Due Wednesday, April 22, at the beginning of class. Homework turned in more than five minutes after the start of class will be marked as late and penalized 10% per day thereafter.
- (15 points) For the following alignment, use the BLOSUM62 matrix (ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62) to compute the score for every set of 3 contiguous columns that do not contain an indel. How many 3-mers receive a score greater than 6?
YECNERSKA-SCPSHLQ--KRRQIG QECNQCGKAFAQHSSLKCHYRTHIG- (15 points) Fill in the following matrix such that the score in row i, column j is the score for aligning the length-three segment starting at position i in the first sequence (HKRT) with the length-three segment starting at position j in the second sequence (QYHERTHTG). Use the BLOSUM62 matrix. I have filled in the first entry for you already. Note that some of the matrix cells along the edges should be left blank.
Q Y H E R T H T G H -2 K R T - (10 points) Following is a list of p-values, created by searching a small database of proteins with the Smith-Waterman algorithm. The list includes one p-value for each target protein in the database. Using a Bonferroni correction, how many p-values are deemed significant if we use a threshold of 0.01?
0.00012, 0.0021, 0.0050, 0.0089, 0.0044, 0.0032, 0.0021, 0.033, 0.13, 0.15, 0.18, 0.29, 0.30, 0.33, 0.33, 0.35, 0.45, 0.47, 0.49, 0.55, 0.56, 0.57, 0.62, 0.63, 0.63, 0.67, 0.78, 0.83, 0.84, 0.95, 0.99
- (10 points) Convert the list of p-values from the previous question into a list of E-values.
Optional programming practice problems
- Write a program to search a given file for occurrences of a specified sequence. Print the line number and location of each occurrence of the file. For example, say that the file
myfile.txtcontains this text:The quick brown fox The quick and the dead Pretty darn quick Molasses in JanuaryThen you could run your program as follows:> search-for-string.py quick myfile.txt Line 1, position 5 Line 2, position 5 Line 3, position 13 > search-for-string.py x myfile.txt Line 1, position 18- Write a program that prints the BLOSUM62 score for a given pair of amino acids. You can include the matrix itself in your program code, rather than reading it from a file.
> python print-blosum62.py K L -2- Write a program that reads a two-line DNA alignment from a file and scores it using +10 for identical bases, 0 for transitions, -5 for transversions and -10 for gaps (linear gap penalty). For example, say that
myfile.txtcontains the following two lines:AACGTGA AAG-TACThen you could score this alignment as follows:> python score-dna-alignment.py myfile.txt 10 + 10 + -5 + -10 + 10 + 0 + -5 = 10