crux search-for-xlinks
Usage:Description:crux search-for-xlinks [options] <ms2 input filename> <protein database> <link sites> <link mass>
This command searches a protein database with a set of spectra. For each spectrum, the precursor mass is computed from either the precursor singly charged mass (m+h) or the mass-to-charge (m/z) and an assumed charge. Candidates molecules are linear peptides, dead-end products, self-loop products or cross-linked products whose mass lies within a specified range of the precursor mass. These candidate peptides are ranked using XCorr. The input protein database is in FASTA format.
The algorithm is described in more detail in the following article:
Sean McIlwain, Paul Draghicescu, Pragya Singh, David R. Goodlett and William Stafford Noble. "Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs." Journal of Proteome Research. 2010.Modifications: Currently,
crux search-for-xlinks
supports static modifications (a change of mass applied to a given amino acid in every peptide in which it occurs) but not variable modifications (allowing peptides to be generated with and without a mass change to a given amino acid). By default, a static modification of +57 Da to cysteine (C) is applied. Static modifications can be specified in the parameter file, as described below.Input:
Output:
- <ms2 > – The name of the file from which to parse the MS/MS spectra. File can in a format supported by proteowizard, except for the vendor formats.
- <protein database> – The name of the file in Fasta format from which to retrieve proteins and peptides.
- <link sites> – A comma delimited list of the amino acid to allow cross-links with. For example A:K,A:D which means that the cross linker can attach A to K or A to D. Also, the n-terminus of a protein can be specified as a link site by using nterm. For example nterm:K means that a cross-linker can attach between a protein's n-terminus and lysine.
- <link mass> – The mass modification of the linker when attached to a peptide.
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:Options:
- search-for-xlinks.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- search.target.txt: a tab-delimited text file containing the PSMs. See xlink txt file format for a list of the fields.
- search.decoy.txt: a tab-delimited text file containing the decoy PSMs. See xlink txt file format for a list of the fields.
- qvalues.target.txt: a tab-delimited text file containing the top ranked PSMs with calculated q-values. See xlink txt file format for a list of the fields.
- search-for-xlinks.log.txt: a log file containing a copy of all messages that were printed to stderr.
Parameter file options:
--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite T|F
Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--spectrum-parser pwiz|mstoolkit|crux
– Parser to use for reading in MS/MS spectra. Default = crux.--spectrum-min-mz <float>
– The lowest spectrum m/z to search in the ms2 file. Default = 0.0.--spectrum-max-mz <float>
– The highest spectrum m/z to search in the ms2 file. Default = no maximum.--spectrum-charge 1|2|3|all
– The spectrum charges to search. With 'all' every spectrum will be searched and spectra with multiple charge states will be searched once at each charge state. With 1, 2, or 3 only spectra with that charge will be searched. Default = all.--compute-sp T|F
– Compute the preliminary Sp score for all candidate peptides. This is recommended if results are to be analyzed bypercolator
orq-ranker
. Default = F.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.--precursor-window <float>
– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window
' of the spectrum mass. Definition of precursor window depends uponprecursor-window-type
. Default = 3.0.--precursor-window-type mass|mz|ppm
– Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by theprecursor-window
option, and candidate peptides must fall within this window. For themass
window-type, the spectrum precursor m+h value is converted to mass, and the window is defined as that mass ±precursor-window
. If the m+h value is not available, then the mass is calculated from the precursor m/z and provided charge. The peptide mass is computed as the sum of the average amino acid masses plus 18 Da for the terminal OH group. Themz
window-type calculates the window as spectrum precursor m/z ±precursor-window
and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm
) window-type, the spectrum mass is calculated as in themass
type. The lower bound of the mass window is then defined as the spectrum mass / (1.0 + (precursor-window
/ 1000000)) and the upper bound is defined as spectrum mass / (1.0 - (precursor-window
/ 1000000)). Default = mass.--precursor-window-weibull <0-1e6>
– Score decoy peptides within +/-precursor-window-weibull
of the precursor mass. The resulting scores are used only for fitting the Weibull distribution. Default = 20.--precursor-window-type-weibull mass|mz|ppm
– Window type to use in conjunction with theprecursor-window-weibull
parameter. Default=mass.--min-weibull-points <int>
– Keep reshuffling and collecting XCorr scores until the minimum number of points for weibull fitting (using targets and decoys) is achieved. Default = 4000.--max-ion-charge <int>
– Predict ions for the theoretical spectra up to max charge state (1,2,...,6) or up to the charge state of the peptide (peptide). If the max-ion-charge is greater than the charge state of the peptide, then the max is the peptide charge. Default='peptide'.--scan-number <int>|<int>-<int>
– A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default = search all spectra.--mz-bin-width <float>
– Before calculation of the XCorr score, the m/z axes of the observed and theoretical spectra are discretized. This parameter specifies the size of each bin. The exact formula is floor((x/mz-bin-width) + 1.0 - mz-bin-offset), where x is the observed m/z value. By default, the mz-bin-width is 1.0005079 Da when searching using monoisotopic mass and 1.0011413 Da with average mass.--mz-bin-offset <float>
– In the discretization of the m/z axes of the observed and theoretical spectra, this parameter specifies the location of the left edge of the first bin, relative to mass = 0 (i.e., mz-bin-offset = 0.xx means the left edge of the first bin will be located at +0.xx Da). The parameter must lie in the range 0 ≤ mz-bin-offset ≤ 1. Default=0.68.--mod-mass-format mod-only|total|separate
– Specify how sequence modifications are reported in various ouptut files. Each modification is reported as a number enclosed in square braces following the modified reside; however, the number may correspond to one of three different masses: (1) 'mod-only' reports the value of the mass shift induced by the modification; (2) 'total' reports the mass of the residue with the modification (residue mass plus modification mass); (3) 'separate' is the same as 'mod-only', but multiple modifications to a single amino acid are reported as a comma-separated list of values. For example, suppose amino acid D has an unmodified mass of 115 as well as two modifications of masses +14 and +2. In this case, the amino acid would be reported as D[16] with 'mod-only', D[131] with 'total', and D[14,2] with 'separate'.--xlink-include-linears T|F
– Include linear peptides in the search. Default = T.--xlink-include-deadends T|F
– Include deadend products in the search. Default = T.--xlink-include-selfloops T|F
– Include selfloops in the search. Default = T.--xlink-prevents-cleavage <string>
– List of amino acids for which the cross-linker can prevent cleavage.--use-flanking-peaks T|F
– Turn on or off the peaks flanking the b/y ions. Forcrux search-for-matches
, default = F; forcrux search-for-xlinks
, default = T.--xcorr-use-flanks T|F
– Use flanking ions in the theoretical spectrum. These are placed +/- 1 Da around the b-y ions, with intensity of 25.0. Default = T.--use-mgf T|F
– Use MGF file format for parsing spectra. Default = F.--top-match <int>
– The number of PSMs per spectrum written to the output files. Default = 5.
fragment-mass average|mono
– Which isotopes to use in calcuating fragment ion mass (average, mono). Default = average.min-mass <float>
– The minimum neutral mass of the peptides to place in the index. Default = 200.max-mass <float>
– The maximum neutral mass of the peptides to place in index. Default = 7200.min-length <int>
– The minimum length of the peptides to place in the index. Default = 4.max-length <int>
– The maximum length of the peptides to place in the index. Default = 50.--enzyme trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|idosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|no-enzyme
– Enzyme to use for in silico digestion of protein sequences. Used in conjunction with the options digestion and missed-cleavages. Use 'no-enzyme' for non-specific digestion. Digestion rules are as follows: enzyme name [cuts after one of these residues]|{but not before one of these residues}. trypsin [RK]|{P}, trypsin/p [RK]|[], elastase [ALIV]|{P}, chymotrypsin [FWYL]|{P}, clostripain [R]|[], cyanogen-bromide [M]|[], iodosobenzoate [W]|[], proline-endopeptidase [P]|[], staph-protease [E]|[], elastase-trypsin-chymotrypsin [ALIVKRWFY]|{P}, asp-n []|[D], lys-c [K]|{P}, lys-n []|[K], arg-c [R]|{P}, glu-c [DE]|{P}, pepsin-a [FL]|{P}. Default = trypsin.custom-enzyme <residues before cleavage>|<residues after cleavage >
&ndash Specify rules for in silico digestion of protein sequences. Overrides theenzyme
option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as[RK]|{P}
. AspN cuts after any residue but only before D which is represented as[X]|[D]
.digestion full-digest|partial-digest
– Degree of digestion used to generate peptides (full-digest, partial-digest). Either both ends or one end of a peptide must conform to enzyme specificity rules. Default full-digest., Used in conjunction with enzyme option when enzyme is not set to to 'no-enzyme'.missed-cleavages <int>
– Allow missed cleavage sites within a peptide. When used with enzyme is specified; includes peptides containing one or more potential cleavage sites. Default = 0.unique-peptides T|F
– For peptides appearing in multiple proteins, store a reference to only one of those proteins. Default = F.isotopic-mass average|mono
– Specify the type of isotopic masses to use when calculating the peptide mass. Default = average.<A-Z> <float>
– Specify static modifications. This is a mass change applied to the given amino acid (in single-letter-code A thru Z) for every peptide in which it occurs. Use themod
option for generating peptides both with and without the mass change. Default C=57.0214637206.