crux search-for-matches
Usage:Description:crux search-for-matches [options] <ms2 input filename> <protein input>This command searches a protein database with a set of spectra. For each spectrum, the precursor mass is computed from the measured precursor m/z and an assumed charge. Candidate peptides whose mass lies within a specified range of the precursor mass are identified. These candidate peptides are first ranked with the SEQUEST Sp score. The top 500 matches are then re-scored using XCorr. The input protein database may either be in FASTA format or it may be a binary index created by
crux create-index.Modifications: Crux handles two types of modifications: static and variable. Static modifications are a change of mass applied to a given amino acid in every peptide in which it occurs. By default, a static modification of +57 da to cystine (C) is applied. Variable modifications allow peptides to be generated with and without a mass change to a given amino acid. Crux handles variable modifications as follows. The user specifies an allowed set of amino acid modifications, using the options
mod,cmodandnmod, which are described below. Before any search is performed, Crux generates an exhaustive list of all possible combinations of amino acid modifications that could be applied to a peptide. For example, if the user specifies a modification of +79 which can be present twice on any peptide and a modification of +30 which can be present only once, then the list will contain modifications corresponding to +79 on one residue, +79 on two different residues, +30 on one residue, +79 on one and +30 on one, +79 on two and +30 on one. Subsequently, for each spectrum, Crux performs one search for each possible combination of modifications. For example, if the precursor m/z for a spectrum is 800 da, the charge state is +2, and Crux is considering a modification of +79, then Crux will retrieve from the database all candidate peptides whose total mass is close to 221 da. These candidates are scored as usual, with a preliminary score and a final score, and the top n candidate peptides are added to a composite, sorted list of peptides. Finally, after all modifications have been searched, Crux reports for the current spectrum the top m peptides from the composite list.Input:
Output:
- <ms2 > – The name of the file (in MS2 format) from which to parse the spectra.
- <protein input> – The name of the file in fasta format or the directory containing a protein index from which to retrieve proteins and peptides.
The program writes files to the folder
crux-outputby default. The name of the output folder can be set by the user using the--output-diroption. The following files will be created:
- search.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-fileoption for other crux programs.- search.target.csm: a binary file that contains the PSMs and additional features. This file can be used as input to
crux compute-q-valuesorcrux percolator.- search.target.sqt: an SQT file containing the PSMs
- search.target.txt: a tab-delimited text file containing the PSMs. See txt file format for a list of the fields.
- search.log.txt: a log file containing a copy of all messages that were printed to stderr.
If decoys are enabled using
Options:--num-decoys-per-target, then three files called search.decoy.csm, search.decoy.sqt and search,decoy.txt are also produced.Parameter file options:
--fileroot <string>– Thefilerootstring will be added as a prefix to all output file names. Default = none.--output-dir <filename>– The name of the directory where output files will be created. Default = crux-out.--overwrite <T|F>Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--num-decoys-per-target <n>– Specify the number of decoy peptides to search for every target peptide searched. Control where the decoys are returned (to what files) with--decoy-location. At least one decoy set (in its own file) is required to run the algorithm 'percolator' in a subsequent crux run. Default = 2.--decoy-location <target-file | one-decoy-file | separate-decoy-files>– File(s) in which decoy results are returned. Only applies whennum-decoys-per-targetis not zero. Use 'target-file' to mix target and decoy psms in one file. Use 'one-decoy-file' to print target psms to one file and all decoys to a separate file. Use 'separate-decoy-files' to print one .csm file for each decoy set. (crux percolatoraccepts up to two decoy.csm files.) Note that at most one search.decoy.sqt and one search.decoy.txt file is produced. Default = separate-decoy-files.--compute-p-values <T | F> &ndashEstimate the paramters of the score distribution for each spectrum and compute a p-value for each PSM. The score distribution parameters are estimated only from target PSM scores. The same parameters will be used to compute p-values for the decoy PSMs. This option can be used in conjunction withcrux compute-q-values.--spectrum-min-mass <float>– The lowest spectrum m/z to search in the ms2 file. Default = 0.0--spectrum-max-mass <float>– The highest spectrum m/z to search in the ms2 file. Default = no maximum.--spectrum-charge <1|2|3|all>– The spectrum charges to search. With 'all' every spectrum will be searched and spectra with multiple charge states will be searched once at each charge state. With 1, 2, or 3 only spectra with that charge will be searched. Default = all.--parameter-file <filename>– A file containing command-line or additional parameters. See the parameter documentation page for details.--verbosity <0-100>– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.--version T– Print the version number and quit. Please note that you must include the 'T' after --version.
fragment-mass <average|mono>– Which isotopes to use in calcuating fragment ion mass (average, mono). Default = mono.mass-window– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'mass-window' of the spectrum mass. Default = 3.0.use-mz-window– Use mass-to-charge instead of a mass-window. Peptides must be within +/- 'm/z-window' of the spectrum m/z. The m/z-window value is taken frommass-window. Default = F.ion-tolerance <float>– Tolerance used for matching observed peaks to predicted fragment ions. Default = 0.5.max-rank-preliminary <int>– Number of psms per spectrum to score after preliminary scoring. Default = 500.top-match <int>– The number of psms per spectrum writen to the output files. Default = 5.NOTE:
crux percolatorrequires thatsearch-for-matchesbe run withtop-match=1.mod <mass change>:<aa list>:<max per peptide> –Consider modifications on any amino acid in aa list with at most max-per-peptide in one peptide. This parameter may be included with different values multiple times so long as the total number ofmod,cmod, andnmodparameters does not exceed 11. The same modifications must be given for any post-search process (compute-q-values, q-ranker).cmod <mass change>:<max distance from protein C-terminus> –Consider modifications on the C-terminus of any peptide whose C-terminus is no more than max-distance residues from the protein C-terminus. Use -1 to consider the C-terminus all peptides regardless of position in the protein. This parameter may be included with different values multiple times so long as the total number ofmod,cmod, andnmodparameters does not exceed 11. The same modifications must be given for any post-search process (compute-q-values, q-ranker).nmod <mass change>:<max distance from protein N-terminus> –Consider modifications on the N-terminus of any peptide whose N-terminus is no more than max-distance residues from the protein N-terminus. Use -1 to consider the N-terminus all peptides regardless of position in the protein. This parameter may be included with different values multiple times so long as the total number ofmod,cmod, andnmodparameters does not exceed 11. The same modifications must be given for any post-search process (compute-q-values, q-ranker).max-mods <n> –The maximum number of modifications that can be applied to a single peptide. Default = no limit.max-aas-modified <n> –The maximum number of modified amino acids that can appear in one peptide. Each aa can be modified multiple times. Default = no limit.precision <n> –Set the precision (number of significant digits) for masses and scores written to sqt and text files. Default 8. Available from parameter file for crux search-for-matches, percolator, and compute-q-values.print-search-progress <n> –Show search progress by printing every n spectra searched. Set to 0 to show no search progress. Available for crux search-for-matches from parameter file. Default = 10.reverse-sequence <T|F> –Generate decoy sequences by reversing the peptide rather than by shuffling. The first and last residues of the sequence are not changed. If the target sequence is a palindrome (the same when reversed), then the decoy will be generated by shuffling and a note to that effect will be printed at verbosity level 40 (DETAILED INFO).NOTE: the following parameters are also used when creating an index and must be compatible with any index used.
min-mass <float>– The minimum neutral mass of the peptides to place in the index. Default = 200.max-mass <float>– The maximum neutral mass of the peptides to place in index. Default = 7200.min-length <int>– The minimum length of the peptides to place in the index. Default = 6.max-length <int>– The maximum length of the peptides to place in the index. Default = 50.enzyme <trypsin|chymotrypsin|elastase|clostripain|cyanogen-bromide|idosobenzoate|proline-endopeptidase|staph-protease|modified-chymotrypsin|elastase-trypsin-chymotrypsin|no-enzyme>– Enzyme to use for in silico digestion of protein sequences. Used in conjunction with the options digestion and missed-cleavages. Use 'no-enzyme' for non-specific digestion. Digestion rules are as follows: enzyme name [cuts after one of these residues]|{but not before one of these residues}. trypsin [RK]|{P}, elastase [ALIV]|{P}, chymotrypsin [FWY]|{P}, clostripain [R]|[], cyanogen-bromide [M]|[], iodosobenzoate [W]|[], proline-endopeptidase [P]|[], staph-protease [E]|[], modified-chymotrypsin [FWYL]|{P}, elastase-trypsin-chymotrypsin [ALIVKRWFY]|{P},aspn []|[D] (cuts before D). Default = trypsin.custom-enzyme <residues before cleavage | residues after cleavage >&ndash Specify rules for in silico digestion of protein sequences. Overrides theenzymeoption. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as[RK]|{P}. AspN cuts after any residue but only before D which is represented as[X]|[D].digestion <full-digest|partial-digest>– Degree of digestion used to generate peptides (full-digest, partial-digest). Either both ends or one end of a peptide must conform to enzyme specificity rules. Default full-digest., Used in conjunction with enzyme option when enzyme is not set to to 'no-enzyme'.missed-cleavages <T|F>– Allow missed cleavage sites within a peptide. When used with enzyme is specified; includes peptides containing one or more potential cleavage sites. Default = F.unique-peptides <T|F>– For peptides appearing in multiple proteins, store a reference to only one of those proteins. Default = F.isotopic-mass <average|mono>– Specify the type of isotopic masses to use when calculating the peptide mass. Default = average.<A-Z> <float>– Specify static modifications. This is a mass change applied to the given amino acid (in single-letter-code A thru Z) for every peptide in which it occurs. Use themodoption for generating peptides both with and without the mass change. Default C=57.