Step 1: Data input
Prism requires two inputs: a file of genomic data, and a catalog of gene annotations. Your first step is to provide the data by uploading it to the server.
The input data must be in tab-delimited text format. The first row of the file contains titles, and each subsequent row contains data corresponding to a single gene. Within a row, each column may contain the gene identifier, a gene expression or other data value, a Boolean flag indicating the presence of corrupted data, or arbitrary text (described below). Rows that begin with a pound sign (#) are considered comments, and are skipped. Here is an example of a gene expression data set formatted in this fashion.
Once you have typed in the name of your input file, press the button labeled "Upload file."
Alternatively, you may enter a data set identifier into the box labeled "Experiment lookup" and press the button labeled "Retrieve data." Once you have entered a data set into Prism, you can return to view it later on (up to seven days later) by using the data set ID assigned by Prism.
Right now, you can experiment with Prism by retrieving the sample data set mentioned above, using the data set identifier "sample."
Step 2: Catalog selection
Next, you select an annotation catalog from the list of catalogs that are stored at the server. If you'd like to use your own annotations, you have the option of uploading those as well. Note that, if you are retrieving a previously uploaded set of data, you will skip this step.
You may select a gene annotation catalog from the pull-down menu. In order for the annotation to work properly, the IDs in your data file must match the IDs in the catalog. To see whether this is the the case, you may view the annotation catalogs via the list at the bottom of the catalog upload page. If your gene IDs are different from those in the catalogs, you may upload your own. The format is simple: each row corresponds to a single gene and contains the gene's ID and corresponding annotation, with a tab character between. For a gene with more than one name, the gene ID can be a comma-separated list. If you do not have an appropriate annotation catalog, you may opt not to include annotations in the Prism output.
Step 3: Selecting column formats
By default, Prism assumes that the input data file consists entirely of gene expression data, with gene IDs in the first column. However, as mentioned above, some columns in the file may have different semantics. Therefore, after uploading a data file, Prism asks you to define the type of each column in your input file. There are four types of columns, as follows:
- Data. Each entry in this column is a real number, usually corresponding to a gene expression measurement. This value will be converted to a color and plotted as one box in Prism's heat map matrix representation.
- Flag. Each entry in this column contains a Boolean flag (value 0 or 1) indicating whether or not the corresponding data is trustworthy. Such flags may be generated by hand, based upon observed problems with the microarray, or (more typically) automatically by the scanning software. These values will be displayed as gray boxes in the primary output page, and upon zooming in to examine a particular gene, the flagged values will have a small grey box on top of them.
- Print. This column contains an arbitrary text string that will be displayed next to the data. This string may contain alternate gene IDs, functional annotation, a relevant statistic, etc.
- Ignore. This column will be ignored by Prism.
There are two ways to specify the type of each column in the input file: by selecting the correct radio button next to the column name, or by typing a range of column indices into the left-hand frame and typing "Update." By default, Prism assumes that the first column contains the gene ID, and all subsequent columns contain data. This is the case for the sample data file.
Step 4: Selecting output options
At the top of the page for selecting column types, there are several options that control the format of the output produced by Prism. These include the following:
Once you have selected these options (or left them with their default values), press the button labeled "Go."
- Gene ID column allows you to specify which column in your input file contains the gene IDs. This column will be used to search the catalog for corresponding annotations. The default behavior is to use the first column in the file.
- Filter allows you to remove all genes that do not meet a specified criterion. By default, all genes are included in the output.
- Sort allows you to sort the primary Prism output on one of the columns from the data matrix.
- Color allows you to select the colors representing high and low expression, respectively.
- Flag value allows you to specify what value is used to indicate corrupt or valid data.
- Aliases allows you to specify whether the alternate names of a given gene are included as part of its annotation.
- Sampling interval allows you to specify the time (in minutes) between experiments, if this is time series data. This value is only used to make gene-specific plots of expression as a function of time. By default, this value is zero, and the plots are indexed by experiment number rather than time.
Step 5: Primary Prism output
The primary Prism output page displays on the left side a heat map representation of the expression matrix. Columns in the matrix correspond to "data" columns in your input file. Each row in the matrix corresponds to a single gene, and the corresponding gene ID and annotation appears to the right. The gene ID is linked to a relevant gene expression database. Clicking on the matrix itself zooms in on a particular gene, as described below.
Note that the top of the page lists a numeric experiment identifier. You should keep track of this number, because you can use it later to return to this data set. You can also send this identifier to other users so that they can view your data.
Step 6: Zooming in on a gene
Clicking on the heat map matrix will take you to a gene-specific page. This page plots the expression level of this gene across the given set of experiments. Flagged data are marked with blue boxes. Columns that the user previously marked "Print" appear below the plot.
Prism was developed by Wei Wu and William Stafford Noble. Questions, comments and requests should be directed to cegrant@u.washington.edu. Prism is copyright 2003 by the University of Washington. It is published under the GNU public license.Prism was developed with support from the National Science Foundation award ISI-0431725.