Periodic genes of yeast Saccharomyces cerevisiae-combined analysis of 5 cell cycle data sets.
The scripts on this page are no longer functional. Please visit the Breeden lab website for a functional version.The 3 yeast cell cycle microarray data sets currently available, the alpha factor, cdc15 and cdc28 synchronizations (Ref.1) have been extensively analyzed and the number of cell cycle regulated genes in yeast Saccharomyces cerevisiae has been estimated to be anywhere from 400-1088 (Ref.1-6). The analysis are, however, limited by the availibilty of only 3 data sets and lack of replicates. We have generated dye swap technical replicates of microarrays across the cell cycle of alpha factor synchronized W303 cells.These replicates called the 30 and the 38 have a sampling interval of 5 min. Additionally, RNA from 6 randomly picked cell cycle time points was labelled with both the Cy3 and the Cy5 dyes and hybridized to itself so as to generate the same vs same hybridizations for error estimation. The microarray data were processed using an error model in the Rosetta Resolver v3.2 Expression Data Analysis System. We have used the Periodic Normal Mixture (PNM) model (Ref.5) to perform a combined analysis of the 5 data sets-30, 38, alpha, cdc15 and cdc28 (Ref.1). The analysis identified 1031 genes as being periodic. We have also estimated the activation time, defined as the time point when the expression level reaches the average between the peak and the trough level, for each gene, using the PNM fitted profile.
These data sets are displayed with a variant of the Prism program.
Prism Visualization of three alpha-factor synchronized yeast cell cycle microarray data sets:
The row-normalized version uses the -z option of Matrix2png to row-normalize the data to mean 0 and variance 1.
Data set 30 ( 5 min interval) row-normalized original
Data set 38 ( 5 min interval) row-normalized original
Data set 26 (10 min interval) row-normalized original
1: Data Sets
Yeast strain: W303a: ade2-loc, trp1-1, can1-100oc, leu2-3, -112, his3-11, -15, ura3All three data sets are alpha-factor synchronized microarray time series spanning two cell cycles. Data set 26 has a sampling interval of 10 minutes, while data set 30 and 38 have a sampling interval of 5 minutes. Data set 30 and 38 are dye swap techincal replicates.
Growth media: YEP glucose
Data Set Labeling convention 26 t0/SS, t10/SS ... t120/SS: Cy5/Cy3 Dye Swap
30 t0/SS, t5/SS ... t120/SS: Cy3/Cy5 38 t0/SS, t5/SS ... t120/SS: Cy5/Cy3 SS: steady state, t: cell cycle time points
If you are visiting this page for the first time, you can click "Click here" to view the microarray data sets with default display options.
Alternatively, you may enter a session identifier, which is assigned the first time you view the page, into the textfield and press the button labeled "Retrieve data." The data will then be displayed with previous options you have specified (See 4: Selecting output options).
2: Primary output
The primary output page displays on the left side two heat maps, the left one is a representation of the expression matrix, while the right one is a representation of the fitted data matrix by PNM. Columns in the matrices correspond to "data" or "fitted data" columns in the data set. Each row in the matrix corresponds to a single gene, and the corresponding gene ID, gene-specific scores from various computational methods (described below), and annotation appears to the right. '-' indicates that no data were available. The gene ID is linked to a relevant genome database. Clicking on the matrix itself zooms in on a particular gene (See 3: Zooming in on a gene).
Scores from the following methods have been estimated:
PNM5 posterior: Periodic Normal Mixture (PNM) model integrating the three public domain data sets plus data sets 30 and 38. All genes with a posterior probability above 0.95 are considered periodic and displayed in red. PNM3 posterior: Periodic Normal Mixture (PNM) model integrating the public domain alpha, cdc15 and cdc28 data sets.All genes with a posterior probability above 0.95 are considered periodic and displayed in red. Spellman CDC: Aggregate Fourier analysis score developed by Spellman et.al. shape-invariant: shape-invariant model, periodic genes are indicated with a list of data sets from which they were identified.
Note that the top of the page lists a numeric session identifier the first time you view this page. You should keep track of this number, because you may change the display options, and later you can use the session number to view the data set according to your own customized display options.
There is a button labeled "Change Display Options" at the top of the heat maps, you may click it to go to the page where you can specify your own display options (See 4: Selecting output options).
3: Zooming in on a gene
Clicking on either of the heat map matrices will take you to a gene-specific page. This page plots the expression level (red) and the expression profile fitted by SPM (green) of this gene across two cell cycles. Flagged data are marked with blue boxes.
4: Default output options
At the top of the right frame, there are several options that control the format of the output. These include the following:
Once you have selected these options (or left them with their default values), press the button labeled "Go."
- Gene ID column allows you to specify which column in your input file contains the gene IDs. This column will be used to search the catalog for corresponding annotations.
- Filter allows you to remove all genes that do not meet a specified criterion. By default, all genes are included in the output.
- Sort allows you to sort the primary Prism output on one of the columns from the data matrix.
- Secondary Sort allows you to sort the primary Prism output on another column of the data matrix to break ties resulting from the primary sorting.
- Color allows you to select the colors representing high and low expression, respectively.
- Flag value allows you to specify what value is used to indicate corrupt or valid data.
- Aliases allows you to specify whether the alternate names of a given gene are included as part of its annotation.
- Sampling interval allows you to specify the time (in minutes) between experiments, if this is time series data. This value is only used to make gene-specific plots of expression as a function of time. By default, this value is zero, and the plots are indexed by experiment number rather than time.
- Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization Mol. Biol. Cell 1998 9: 3273-3297. by Paul T. Spellman, Gavin Sherlock, Michael Q. Zhang, Vishwanath R. Iyer, Kirk Anders, Michael B. Eisen, Patrick O. Brown, David Botstein, and Bruce Futcher (Website)
- Statistical modeling of large microarray data sets to identify stimulus-response profiles PNAS 2001 98: 5631-5636. by Lue Ping Zhao, Ross Prentice, and Linda Breeden (Website)
- A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription Bioinformatics 2003 19: 467-473. by Daniel Johansson, Petter Lindgren, and Anders Berglund (Website)
- Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data Bioinformatics 2004 20: 332-339 by Y. Luan and H. Li (Website)
- Statistical resynchronization and Bayesian detection of periodically expressed genes Nucl. Acids. Res. 2004 32: 447-455 by Xin Lu, Wen Zhang, Zhaohui S. Qin, Kurt E. Kwast, and Jun S. Liu (Website)
- Protein Feature Based Identification of Cell Cycle Regulated Proteins in Yeast J Mol. Biol. 2003 329(4): 663-674 by Ulrik de Lichtenberg, Thomas S. Jensen, Lars J. Jensen and Søren Brunak (Website)