GENOME 373: Genomic Informatics Homework 5
Due Wednesday, May 6, at the beginning of class. Homework turned in more than five minutes after the start of class will be marked as late and penalized 10% per day thereafter.
- (2 points) Consider the following yeast sequences:
acctgcgggccgtcTAAAAATTaaggaaaagcagcaaaggtgcatttttaaaatatgaaa tgaagataccgcagtaccaattaTTTTCGCAgtacaaataatgcgcggccggtgcatttt tcGAAAGAACgcgagacaaacaggacaattaaagttagtttttcgagttagcgtgtttga atactgcaagatacaagataaatagagtagttgAAACTAGAtatcaattgcacacaagat cggcgctaagcATGCCACAatttggtatattatgtaaaacaccacctaaggtgcttgttc gtcagtttgtggaaaggtttgaaagACCTTCAGgtgagaaaatagcattatgtgctgctg aactaacctatttatgttggatgattacacataacggaacagcaatcAAGAGAGCcacat tcatgagctataaTACTATCAtaagcaattcgctgagtttcgatattgtcaataaatcacThe uppercase letters are the subsequences selected by the Gibbs sampler for its initial guess at the motif occurrences. How were these sequences selected?
- (3 points) Continuing the example from the previous question, say that during the first round of Gibbs sampling, the subsequence from the first sequence is removed. Write down the corresponding counts matrix. Do not include pseudocounts.
- (5 points) A Gibbs sampler will scan a candidate motif, with one subsequence removed, against the corresponding sequence. Say that the integerized scores assigned to each position in the sequence are as follows: 10, 45, 47, 13, 14, 92, 42, 66, 10, 08, 18, 25. What is the probability that the Gibbs sampler will select the first position in the sequence as the new motif occurrence?
- (5 points) One of the two steps in the inner loop of the EM algorithm is "Re-estimate the motif occurrences from the matrix." Explain how this step works.
- (5 points) What is the average degree of the following network?
![]()
- (5 points) What is the maximum in-degree of the following network? What is the maximum out-degree? Indicate which nodes correspond to both maxima.
![]()
- (5 points) Make a histogram showing the degree distribution of this network.
![]()
- (5 points) Re-draw the previous histogram using logarithmic x- and y-axes. Is the network scale-free?
- (5 points) What is the clustering coefficient of node A in this network?
![]()
- (10 points) This network contains a motif involving three nodes. What is the motif? How many times does it occur?
![]()
Optional programming practice
- Write a program to read an adjacency matrix of 1s and 0s, storing the corresponding directed network in a list of lists. After reading the network, print out a list of edges, represented as pairs of nodes.
> cat my-matrix.txt 0 1 1 0 0 0 1 0 0 > python print-edge-list.py my-matrix.txt Node 1 -> Node 2 Node 1 -> Node 3 Node 3 -> Node 1- Write a program to read an adjacency matrix of 1s and 0s, storing the corresponding directed network in a list of lists. Print a symmetrized version of the matrix to an output file specified by the user. I.e., if the original matrix has an edge from node 2 to node 3, then the output matrix should also include an edge from node 3 to node 2.
> python symmetrize-matrix.py my-matrix.txt symmetric-matrix.txt 0 1 1 1 0 0 1 0 0- Write a program to read an adjacency matrix of 1s and 0s, storing the corresponding directed network in a list of lists. Find all directed triangles in the matrix, and print their corresponding node indices.
> cat my-matrix2.txt 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 > python find-directed-triangles.py my-matrix.txt Node 1 -> Node 3 -> Node 4 -> Node 1