crux compute-q-values
Usage:
crux compute-qvalues [options] <protein input>Description:
Given a collection of scored PSMs, estimate q-values. The q-value is a statistical confidence measure that is analogous to a p-value but that incorporates multiple testing. The q-value associated with a score threshold T is defined as the minimal false discovery rate at which a score of T is deemed significant. In this setting, the q-value accounts for the fact that we are analyzing a large collection of spectra.
Compute-q-valuesrequires thatsearch-for-matchesproduced decoy PSMs or that it computed p-values. Up to three different types of q-values may be computed, depending on howsearch-for-matcheswas executed. Each of the three types of q-value calculation is based upon a different method for calculating the false discovery rate, as follows:
- If the collection of PSMs has p-values assigned by
search-for-matchesby using the--compute-p-valuesoption, then the false discovery rate associated with a given score is computed using the standard Benjamini-Hochberg algorithm (Journal of the Royal Statistical Society B, 57:289-300, 1995). Briefly, the algorithm is as follows: (1) rank all m PSMs by p-value; (2) for a p-value pj that appears at position j in the ranked list, the FDR is estimated as pj * (m/j).- If the collection of PSMs contains both targets and decoys, then the false discovery rate associated with a given xcorr score is estimated as the number of decoy scores above the threshold divided by the number of target scores above the threshold, multiplied by the ratio of the total number of targets to total number of decoys.
- If the collection has decoys and p-values, then a third false discovery rate estimate is computed using the decoys, but the ranking is based on p-values instead of xcorr.
In each case, the estimated FDRs are converted to q-values by ranking the PSMs by score and then taking, for each PSM, the minimum of the current FDR and all of the FDRs below it in the ranked list. The three types of q-values are reported, respectively, in columns with headers "Weibull est. q-value," "decoy q-value (xcorr)" and "decoy q-value (p-value)".
Additional background information on q-values and statistical confidence measures can be found in this article:
Lukas Käll, John D. Storey, Michael J. MacCoss and William Stafford Noble. "Assigning significance to peptides identified by tandem mass spectrometry using decoy databases." Journal of Proteome Research. 7(1):29-34, 2008.Note that compute-q-values does not (yet) estimate the percentage of incorrect targets, as described in the above article. Hence, the method implemented here as "decoy q-values" is analogous to the "Simple FDR" procedure shown in Figure 4A of the above article.
Input:
Output:
- <psm folder> – the folder in which all the PSM result files are located. This is assumed to be
crux-output, but it can be set by the user using the--output-diroption. The program looks for files ending in.csm, which are produced bycrux search-for-matches. All such files in the given directory are analyzed jointly.- <protein input > – The name of the file in fasta format or the directory containing the protein index from which to retrieve proteins and peptides.
Options:
The program writes files to the folder
crux-outputby default. The name of the output folder can be set by the user using the--output-diroption. The following files will be created:- qvalues.target.txt: a tab-delimited text file containing the PSMs.
See txt file format for a list of the fields.- qvalues.log.txt: a log file containing a copy of all messages that were printed to stderr.
- qvalues.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-fileoption for other crux programs.
--fileroot <string>– Thefilerootstring will be added as a prefix to all output file names. Default = none.--output-dir <filename>– The name of the directory where output files will be created. Default = crux-out.--overwrite <T|F>Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <filename>– A file containing command-line or additional parameters. See the parameter documentation page for details.--verbosity <0-100>– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.--version T– Print the version number and quit. Please note that you must include the 'T' after --version.