GENOME 373: Genomic Informatics

Homework 2

Due Wednesday, April 15, at the beginning of class. Homework turned in more than five minutes after the start of class will be marked as late and penalized 10% per day thereafter.

  1. (2 points) In what year was the Needleman-Wunsch algorithm published?
  2. (15 points) Draw and fill in the dynamic programming matrix to align these two sequences: TTGAC and TGATT. Use this substitution matrix:

      A C G T
    A 2 -7 -3 -7
    C -7 2 -7 -3
    G -3 -7 2 -7
    T -7 -3 -7 2

    and use a fixed gap penalty of -5. What is the optimal global alignment and its corresponding score?

  3. (15 points) Compute the optimal local alignment between EAMPK and ISCCE using the BLOSUM80 alignment matrix (available at ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM80) and a linear gap penalty of -4. Show your work.
  4. (7 points) The following histogram shows the probability of observing a given score from a sequence comparison algorithm. The numbers above each bar give the height of that bar. Compute the p-value associated with a score of 34.
  5. (6 points) What is the p-value associated with observing a score of 17 or higher when you roll three six-sided dice? Assume that the dice are fair.
  6. (5 points) Compute the p-value associated with a sequence alignment score of 43, using an EVD with mu=35 and lambda=0.333.

Optional programming practice problems

  1. Write a program compute-evd-p-value.py that computes the p-value associated with a given score. The user should provide the mu and lambda values, as well as the score, and the program should print the corresponding p-value, like this:

       > compute-evd-p-value.py 25 0.693 45
       9.565e-07
    
  2. Write a program copy-file.py that copies a given file. For example, if you have a file called hello.txt that contains one line ("Hello, world!"), then you could create a copy of this file called world.txt as follows:

    > python copy-file.py hello.txt world.txt
    > cat world.txt
    Hello, world!
    

    Make sure your program works even if the input file contains more than one line.

  3. Write a program reverse-lines.py that reads in the contents of a file, and prints out the lines in reverse order. For example, say that your file is called three-lines.txt and consists of these three lines:

    This is the first line.
    This is the second line.
    This is the third line.
    

    Your program should do this:

    > python reverse-lines.py three-lines.txt
    This is the third line.
    This is the second line.
    This is the first line.
    

    Make sure your program works even if the input file does not contain exactly three lines.

  4. Write a program split-number.py that reads a real number from the command line and prints its integer part on one line, followed by its decimal part (i.e., the digits after the decimal point) on a second line. For the decimal part, print no more than 6 digits after the decimal, but do not print trailing zeroes.

    > python split-number.py 1.234567
    1
    0.234567
    > python split-number.py 1.23456711
    1
    0.234567
    > python split-number.py 1.23
    1
    0.23
    
  5. Write a program format-number.py that takes as input two arguments: a number and a format, where the format is either integer, real or scientific. Print the given number in the requested format, and print an error if an invalid format string is provided.

    > python format-number.py 3.14159 integer
    3
    > python format-number.py 3.14159 real
    3.14159
    > python format-number.py 3.14159 scientific
    3.141590e+00
    > python format-number.py 3.14159 foo
    Invalid format: foo