crux compute-q-values

Usage:

crux compute-qvalues [options] <protein input>

Description:

Given a collection of scored PSMs, estimate q-values. The q-value is a statistical confidence measure that is analogous to a p-value but that incorporates multiple testing. The q-value associated with a score threshold T is defined as the minimal false discovery rate at which a score of T is deemed significant. In this setting, the q-value accounts for the fact that we are analyzing a large collection of spectra.

Compute-q-values requires that search-for-matches produced decoy PSMs or that it computed p-values. Up to three different types of q-values may be computed, depending on how search-for-matches was executed. Each of the three types of q-value calculation is based upon a different method for calculating the false discovery rate, as follows:

  1. If the collection of PSMs has p-values assigned by search-for-matches by using the --compute-p-values option, then the false discovery rate associated with a given score is computed using the standard Benjamini-Hochberg algorithm (Journal of the Royal Statistical Society B, 57:289-300, 1995). Briefly, the algorithm is as follows: (1) rank all m PSMs by p-value; (2) for a p-value pj that appears at position j in the ranked list, the FDR is estimated as pj * (m/j).
  2. If the collection of PSMs contains both targets and decoys, then the false discovery rate associated with a given xcorr score is estimated as the number of decoy scores above the threshold divided by the number of target scores above the threshold, multiplied by the ratio of the total number of targets to total number of decoys.
  3. If the collection has decoys and p-values, then a third false discovery rate estimate is computed using the decoys, but the ranking is based on p-values instead of xcorr.

In each case, the estimated FDRs are converted to q-values by ranking the PSMs by score and then taking, for each PSM, the minimum of the current FDR and all of the FDRs below it in the ranked list. The three types of q-values are reported, respectively, in columns with headers "Weibull est. q-value," "decoy q-value (xcorr)" and "decoy q-value (p-value)".

Additional background information on q-values and statistical confidence measures can be found in this article:

Lukas Käll, John D. Storey, Michael J. MacCoss and William Stafford Noble. "Assigning significance to peptides identified by tandem mass spectrometry using decoy databases." Journal of Proteome Research. 7(1):29-34, 2008.

Note that compute-q-values does not (yet) estimate the percentage of incorrect targets, as described in the above article. Hence, the method implemented here as "decoy q-values" is analogous to the "Simple FDR" procedure shown in Figure 4A of the above article.

Input:

Output: Options:

Crux home