Release notes for Crux
Version 1.22
September 16, 2009Major enhancements
- Crux is now distributed in two versions, a full version that is covered by the same type of license as before (free to non-profit users, and via a licensing fee to commercial users), as well as a stripped-down version that is released under an open source license. The stripped-down version does not include the database search functionality but does include all of the post-processing tools. We are unable to release the entire Crux package under an open source license due to intellectual property issues. Both versions of Crux are available via the Crux web page: http://noble.gs.washington.edu/proj/crux/.
- A new tool, q-ranker, is available for estimating peptide-spectrum match q-values. This tool was described in the following article:
Marina Spivak, Jason Weston, Leon Bottou, Lukas Käll and William Stafford Noble. "Improvements to the Percolator algorithm for peptide identification from shotgun proteomics data sets." Journal of Proteome Research.- Version 1.05 of
percolatorhas now been integrated into the Crux source tree. A separate installation ofpercolatoris no longer needed for basicpercolatorfunctionality. Note, however, thatpercolatorremains under active development. You may therefore wish to install the current, stand-alone version ofpercolatorand run it separately to take advantage of new features.Minor changes
- The internal normalization of the observed spectra has been modifed to drop those peaks whose instensity is less then 1/20 of the maximum intensity in the spectrum. This brings the xcorr score for
cruxinto closer agreement with the xcorr score forsequest.- Compute-q-values now generates three different q-values (1) from p-values using an analytical null model, (2) from decoys and xcorr using an empirical null model, or (3) from decoys and p-values using an empirical null model. All three types of q-values are computed when p-values and decoys are present in the search results.
- A copy of the parameter file is now automatically written to the output directory.
- A log file recording messages sent to stderr has been added for
search-for-matches,compute-q-values, andpercolator.- The
--use-mz-windowparameter is now available forsearch-for-matches. When enabled, peptides must be within +/- 'm/z-window' of the spectrum m/z. The m/z-window value is taken frommass-window.- A numerical bug in the Weibull p-value calculation was fixed, which had previously caused occasional erroneous NaNs to be output.
- The Weibull estimated p-values generated by
search-for-matchesare now returned as p-values instead of as -log(p-value). The corresponding q-values returned fromcompute-q-valuesare also now returned without the -log transform.- The
--precisionoption has been changed to control the total number of significant digits printed instead of the number of digits after the decimal point. The default precision has changed from 6 to 8.- The parameters estimated for the Weibull distribution (used for computing p-values) now use the xcorrs from all PSMs for a spectrum instead of a random selection of 500.
- The estimation of Weibull distribution parameters requires a minimum number of scored PSMs. In the previous version, spectra with fewer PSMs than the minimum were not given a p-value. Crux will now generate extra decoys until there are enough scores.
- The p-values for decoy PSMs are now generated from the same Weibull distribution parameters as are used for the targets of the same spectrum.
Version 1.21
May 14, 2009
- The output for
search-for-matches,compute-q-values, andpercolatorhas been revised extensively.cruxwill now create a directory, and all output files will be created in that directory. By default the directory will be namedcrux-output, but this can be changed using the newoutput-diroption.
The output files forsearch-for-matcheswill be:The output files for
search.target.csmsearch.decoy-?.csmsearch.target.sqtsearch.target.txtcompute-q-valueswill be:The output files for
qvalues.target.sqtqvalues.target.txtpercolatorwill be:
percolator.target.sqtpercolator.target.txt- The
filerootoption has been added. This option is used to specify a string which will be added as a prefix to all output files.- The option
cleavageswas replaced with two options,enzymewhich specifies the name of an enzyme (e.g.trypsin) anddigestionwhich indicates the degree of specificity, partial or full digest. The full list of available enzymes is in the html docs and in the usage statement. See alsocustom-enzymebelow.- The option
custom-enzymeallows users to define arbitrary digestion rules. This overrides theenzymeoption. Syntax for the custom digestion rule is the same the syntax used by X!Tandem and is described in the html docs.- The number of PSMs per spectrum printed to the output files is now controlled by one option,
top-match. This makesmax-sqt-resultobsolete.- It is now possible to control how many decoy sequences are generated and in which file(s) they are returned. There is a new option,
num-decoys-per-target, which can be used to generate more than one shuffled peptide per spectrum. This replacesnumber-decoy-set.- A new option,
decoy-locationhas been introduced. The three possible values are 'target-file' where all PSMs (target and decoys) are sorted together for each spectrum and returned in one file, 'one-decoy-file' where target PSMs are printed to one file and all decoys are printed to another, and 'separate-decoy-files' where there are as many decoy files as there are decoys per target.- Protein names for decoy matches are now prepended with 'rand_' in the SQT files as in 'L rand_Y45678'.
- The option
unique-peptidesonly applies tocrux-generate-matches. Each peptide is stored in the index exactly once with references to all protein sources. Searches with fasta files print each peptide only once.- The precision of the masses and scores printed to the sqt and text files can now be specified by the user. The default precision changed from 2 to 6.
- Search progress is now reported by printing every 10th spectrum that is searched. The verbosity can be adjusted with the parameter
print-search-progress.- Decoy (shuffled) sequences now keep the first and last residue the same as the target sequence that was shuffled to produce it. This is a reversion to previous behavior.
- It is now possible to skip the Sp score and score all PSMs with xcorr. The default procedure is still to score all peptides for one spectrum with Sp, rank by Sp, and eliminate all but the best-ranking PSMs (by default, the top 500). The remaining PSMs are scored by xcorr, re-ranked by xcorr and the top results returned. By setting max-rank-preliminary=0, the Sp scoring is skipped and xcorr is computed for all PSMs.
- A new parameter
reverse-sequencecan be used to generate decoy peptides by reversing them rather than shuffling. The first and last residues are left unmoved. If the sequence is a palindrome , then a decoy will be generated by shuffling and a note to that effect will be printed at the DETAILED INFO level of output (verbosity = 40).- P-values are now computed for decoy peptides.
- The algorithm used to calculated the xcorr score has been modified so that xcorr score will be in better agreement with scores generated by SEQUEST.
Version 1.20
January 6, 2009
- Generating peptides and searching with up to eleven different dynamic modifications is now possible. New options associated with this feature are mod, cmod, nmod, max-mods, max-aas-modified.
- The format of the .csm files has changed and files written by older versions of crux are not readable with crux version 1.2.
- When the option
cleavagesis set toall, peptide generation ignores all tryptic cleavage sites, effectively setting themissed-cleavagesoption toTRUEregardless of user settings.- When one spectrum has identical xcorr scores for different sequences, the rank of all those matches will be the same. Matches with the next highest score will rank one below.
- The options for setting the preliminary and primary score type have been removed and are fixed as Sp and xcorr, respectively. A new option,
compute-p-values=<T | F>, was added to control p-value computation.- The SQT file contains the spectrum calculated mass instead of observed mass/charge on the S line.
- There is now a test for confirming that the file downloaded from the crux website was not corrupted. See installation instructions for details.
- Calculating a p-value requires a minimum of 40 matches. Spectra with fewer than 40 matches will have p-value scores returned as NaN and a warning will be printed at the DETAILED_INFO level (40) of verbosity.
- Fixed error in generating neutral-loss peaks created as part of the theoretical spectrum.
Version 1.02
December 1, 2008
- Three programs,
crux-create-index,crux-search-for-matches, andcrux-analyze-matches, were merged into one program namedcrux.- Percolator is now truly optional as all Crux programs will build without it.
- Fragment masses can now be calculated as average or mono-isotopic. This is controlled by the
fragment-massoption in the parameter file.- The name of the score-type option that calculates p-values was changed from xcorr-logp to xcorr-pvalue.
- SQT files have two new lines in the header which describe the arrangement of values in the results.
- HTML documentation was updated to reflect the above changes.
Version 1.01
October 15, 2008
- A bug limiting the length of the name of an index file was fixed.
- Modifications were made so that Crux will build with version 1.05 of Percolator. This is the only supported version of Percolator.
- Memory leaks in
crux-search-for-matcheswere patched.- The
--versionoption was added.Version 1.0
March 4, 2008
Initial release