GENOME 373: Genomic Informatics Homework 7
Due Wednesday, May 27, at the beginning of class. Homework turned in more than five minutes after the start of class will be marked as late and penalized 10% per day thereafter.
- (10 points) Compute the t statistic associated with the following two sets of gene expression measurements: {2.24, 3.74, 2.93, 4.27} and {2.13, 1.97, 3.05}. Show your work.
- (5 points) Explain in words how you would convert the t statistic from the previous question into a p-value, using a two-tailed test.
- (5 points) How should you decide whether to do multiple testing correction using a Bonferroni correction or false discovery rate estimation?
- (8 points) You are doing a microarray analysis experiment to study differences in gene expression associated with neural development in mice. You have collected samples from a total of 9 mice. Three samples were collected at day 5 after birth, three at day 10 and three at day 20. For each mouse, you dissect the brain and extract two samples from, respectively, the dorsal and ventral regions of the nuclear laminus. You carry out one microarray analysis on each sample, and then you want to do an ANOVA to interpret your results. How many factors in this design, what are the factors, and how many levels are there for each factor?
- (10 points) One problem with microarray data is that some measurements in each array are missing, usually because of flaws in the array or the hybridization. A common way to combat this problem is to fill in each missing value with the average value for the given gene across a set of experiments. Write a Python program that takes as input a tab-delimited text file in which rows correspond to genes and columns correspond to experiments. The program should fill in missing values with the average of the values for that gene, and then print the resulting matrix in the same format. Turn in your program, as well as a print-out of the results of running it on this file.
- (12 points) Modify your program to use column averages rather than row averages. In other words, replace each missing value with the average value across all genes in the same experiment. Turn in the your program plus the print-out from running it on the same file as in the previous question.