comet
Usage:
crux comet [options] <input spectra>+ <database_name>
Description:
This command searches a protein database with a set of spectra, assigning peptide sequences to the observed spectra. This search engine was developed by Jimmy Eng at the University of Washington Proteomics Resource.
Although its history goes back two decades, the Comet search engine was first made publicly available in August 2012 on SourceForge. Comet is multithreaded and supports multiple input and output formats.
"Comet: an open source tandem mass spectrometry sequence database search tool." Eng JK, Jahan TA, Hoopmann MR. Proteomics. 2012 Nov 12. doi: 10.1002/pmic201200439
Input:
input spectra+
– The name of the file from which to parse the spectra. Valid formats include mzXML, mzML, mz5, raw, ms2, and cms2. Files in mzML or mzXML may be compressed with gzip. RAW files can be parsed only under windows and if the appropriate libraries were included at compile time.database_name
– A full or relative path to the sequence database, in FASTA format, to search. Example databases include RefSeq or UniProt. The database can contain amino acid sequences or nucleic acid sequences. If sequences are amino acid sequences, set the parameter "nucleotide_reading_frame = 0". If the sequences are nucleic acid sequences, you must instruct Comet to translate these to amino acid sequences. Do this by setting nucleotide_reading_frame" to a value between 1 and 9.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
comet.target.txt
– a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.comet.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.comet.log.txt
– a log file containing a copy of all messages that were printed to standard error.
Options:
-
Database
--decoy_search <integer>
– 0=no, 1=concatenated search, 2=separate search. Default =0
.
-
CPU threads
--num_threads <integer>
– 0=poll CPU to set num threads; else specify num threads directly. Default =0
.
-
Masses
--peptide_mass_tolerance <float>
– Controls the mass tolerance value. The mass tolerance is set at +/- the specified number i.e. an entered value of "1.0" applies a -1.0 to +1.0 tolerance. The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default =3
.--auto_peptide_mass_tolerance false|warn|fail
– Automatically estimate optimal value for the peptide_mass_tolerancel parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default =false
.--peptide_mass_units <integer>
– 0=amu, 1=mmu, 2=ppm. Default =0
.--mass_type_parent <integer>
– 0=average masses, 1=monoisotopic masses. Default =1
.--mass_type_fragment <integer>
– 0=average masses, 1=monoisotopic masses. Default =1
.--precursor_tolerance_type <integer>
– 0=singly charged peptide mass, 1=precursor m/z. Default =0
.--isotope_error <integer>
– 0=off, 1=on -1/0/1/2/3 (standard C13 error), 2=-8/-4/0/4/8 (for +4/+8 labeling). Default =0
.
-
Search enzyme
--search_enzyme_number <integer>
– Specify a search enzyme from the end of the parameter file. Default =1
.--num_enzyme_termini <integer>
– valid values are 1 (semi-digested), 2 (fully digested), 8 N-term, 9 C-term. Default =2
.--allowed_missed_cleavage <integer>
– Maximum value is 5; for enzyme search. Default =2
.
-
Fragment ions
--fragment_bin_tol <float>
– Binning to use on fragment ions. Default =1.000507
.--fragment_bin_offset <float>
– Offset position to start the binning (0.0 to 1.0). Default =0.4
.--auto_fragment_bin_tol false|warn|fail
– Automatically estimate optimal value for the fragment_bin_tol parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default =false
.--theoretical_fragment_ions <integer>
– 0=default peak shape, 1=M peak only. Default =1
.--use_A_ions <integer>
– Controls whether or not A-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_B_ions <integer>
– Controls whether or not B-ions are considered in the search (0 - no, 1 - yes). Default =1
.--use_C_ions <integer>
– Controls whether or not C-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_X_ions <integer>
– Controls whether or not X-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_Y_ions <integer>
– Controls whether or not Y-ions are considered in the search (0 - no, 1 - yes). Default =1
.--use_Z_ions <integer>
– Controls whether or not Z-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_NL_ions <integer>
– 0=no, 1= yes to consider NH3/H2O neutral loss peak. Default =1
.
-
mzXML/mzML parameters
--scan_range <string>
– Start and scan scan range to search; 0 as first entry ignores parameter. Default =0 0
.--precursor_charge <string>
– Precursor charge range to analyze; does not override mzXML charge; 0 as first entry ignores parameter. Default =0 0
.--override_charge <integer>
– Specifies the whether to override existing precursor charge state information when present in the files with the charge range specified by the "precursor_charge" parameter. Default =0
.--ms_level <integer>
– MS level to analyze, valid are levels 2 or 3. Default =2
.--activation_method ALL|CID|ECD|ETD|PQD|HCD|IRMPD
– Specifies which scan types are searched. Default =ALL
.
-
Miscellaneous parameters
--digest_mass_range <string>
– MH+ peptide mass range to analyze. Default =600.0 5000.0
.--num_results <integer>
– Number of search hits to store internally. Default =50
.--skip_researching <integer>
– For '.out' file output only, 0=search everything again, 1=don't search if .out exists. Default =1
.--max_fragment_charge <integer>
– Set maximum fragment charge state to analyze (allowed max 5). Default =3
.--max_precursor_charge <integer>
– Set maximum precursor charge state to analyze (allowed max 9). Default =6
.--nucleotide_reading_frame <integer>
– 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six. Default =0
.--clip_nterm_methionine <integer>
– 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine. Default =0
.--spectrum_batch_size <integer>
– Maximum number of spectra to search at a time; 0 to search the entire scan range in one loop. Default =0
.--decoy_prefix <string>
– Specifies the prefix of the protein names that indicates a decoy. Default =decoy_
.--output_suffix <string>
– Specifies the suffix string that is appended to the base output name for the pep.xml, pin.xml, txt and sqt output files. Default =<empty>
.--mass_offsets <string>
– Specifies one or more mass offsets to apply. This value(s) are effectively subtracted from each precursor mass such that peptides that are smaller than the precursor mass by the offset value can still be matched to the respective spectrum. Default =<empty>
.
-
Spectral processing
--minimum_peaks <integer>
– Minimum number of peaks in spectrum to search. Default =10
.--minimum_intensity <float>
– Minimum intensity value to read in. Default =0
.--remove_precursor_peak <integer>
– 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD). Default =0
.--remove_precursor_tolerance <float>
– +- Da tolerance for precursor removal. Default =1.5
.--clear_mz_range <string>
– For iTRAQ/TMT type data; will clear out all peaks in the specified m/z range. Default =0.0 0.0
.
-
Variable modifications
--variable_mod01 <string>
– Up to 9 variable modifications are supported. Each modification is specified using seven entries: "<mass> <residues> <type> <max> <distance> <terminus> <force>." Type is 0 for static mods and non-zero for variable mods. Note that that if you set the same type value on multiple modification entries, Comet will treat those variable modifications as a binary set. This means that all modifiable residues in the binary set must be unmodified or modified. Multiple binary sets can be specified by setting a different binary modification value. Max is an integer specifying the maximum number of modified residues possible in a peptide for this modification entry. Distance specifies the distance the modification is applied to from the respective terminus: -1 = no distance contraint; 0 = only applies to terminal residue; N = only applies to terminal residue through next N residues. Terminus specifies which terminus the distance constraint is applied to: 0 = protein N-terminus; 1 = protein C-terminus; 2 = peptide N-terminus; 3 = peptide C-terminus.Force specifies whether peptides must contain this modification: 0 = not forced to be present; 1 = modification is required. Default =0.0 null 0 4 -1 0 0
.--variable_mod02 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod03 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod04 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod05 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod06 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod07 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod08 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod09 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--max_variable_mods_in_peptide <integer>
– Specifies the total/maximum number of residues that can be modified in a peptide. Default =5
.--require_variable_mod <integer>
– Controls whether the analyzed peptides must contain at least one variable modification. Default =0
.
-
Static modifications
--add_Cterm_peptide <float>
– Specifiy a static modification to the c-terminus of all peptides. Default =0
.--add_Nterm_peptide <float>
– Specify a static modification to the n-terminus of all peptides. Default =0
.--add_Cterm_protein <float>
– Specify a static modification to the c-terminal peptide of each protein. Default =0
.--add_Nterm_protein <float>
– Specify a static modification to the n-terminal peptide of each protein. Default =0
.--add_A_alanine <float>
– Specify a static modification to the residue A. Default =0
.--add_B_user_amino_acid <float>
– Specify a static modification to the residue B. Default =0
.--add_C_cysteine <float>
– Specify a static modification to the residue C. Default =57.021464
.--add_D_aspartic_acid <float>
– Specify a static modification to the residue D. Default =0
.--add_E_glutamic_acid <float>
– Specify a static modification to the residue E. Default =0
.--add_F_phenylalanine <float>
– Specify a static modification to the residue F. Default =0
.--add_G_glycine <float>
– Specify a static modification to the residue G. Default =0
.--add_H_histidine <float>
– Specify a static modification to the residue H. Default =0
.--add_I_isoleucine <float>
– Specify a static modification to the residue I. Default =0
.--add_J_user_amino_acid <float>
– Specify a static modification to the residue J. Default =0
.--add_K_lysine <float>
– Specify a static modification to the residue K. Default =0
.--add_L_leucine <float>
– Specify a static modification to the residue L. Default =0
.--add_M_methionine <float>
– Specify a static modification to the residue M. Default =0
.--add_N_asparagine <float>
– Specify a static modification to the residue N. Default =0
.--add_O_ornithine <float>
– Specify a static modification to the residue O. Default =0
.--add_P_proline <float>
– Specify a static modification to the residue P. Default =0
.--add_Q_glutamine <float>
– Specify a static modification to the residue Q. Default =0
.--add_R_arginine <float>
– Specify a static modification to the residue R. Default =0
.--add_S_serine <float>
– Specify a static modification to the residue S. Default =0
.--add_T_threonine <float>
– Specify a static modification to the residue T. Default =0
.--add_U_selenocysteine <float>
– Specify a static modification to the residue U. Default =0
.--add_V_valine <float>
– Specify a static modification to the residue V. Default =0
.--add_W_tryptophan <float>
– Specify a static modification to the residue W. Default =0
.--add_X_user_amino_acid <float>
– Specify a static modification to the residue X. Default =0
.--add_Y_tyrosine <float>
– Specify a static modification to the residue Y. Default =0
.--add_Z_user_amino_acid <float>
– Specify a static modification to the residue Z. Default =0
.
-
param-medic options
--pm-min-precursor-mz <float>
– Minimum precursor m/z value to use in measurement error estimation. Default =400
.--pm-max-precursor-mz <float>
– Minimum precursor m/z value to use in measurement error estimation. Default =1800
.--pm-min-frag-mz <float>
– Minimum fragment m/z value to use in measurement error estimation. Default =150
.--pm-max-frag-mz <float>
– Maximum fragment m/z value to use in measurement error estimation. Default =1800
.--pm-min-scan-frag-peaks <integer>
– Minimum fragment peaks an MS/MS scan must contain to be used in measurement error estimation. Default =40
.--pm-max-precursor-delta-ppm <float>
– Maximum ppm distance between precursor m/z values to consider two scans potentially generated by the same peptide for measurement error estimation. Default =50
.--pm-charge <integer>
– Precursor charge state to consider MS/MS spectra from, in measurement error estimation. Ideally, this should be the most frequently occurring charge state in the given data. Default =2
.--pm-top-n-frag-peaks <integer>
– Number of most-intense fragment peaks to consider for measurement error estimation, per MS/MS spectrum. Default =30
.--pm-pair-top-n-frag-peaks <integer>
– Number of fragment peaks per spectrum pair to be used in fragment error estimation. Default =5
.--pm-min-common-frag-peaks <integer>
– Number of the most-intense peaks that two spectra must share in order to potentially be generated by the same peptide, for measurement error estimation. Default =20
.--pm-max-scan-separation <integer>
– Maximum number of scans two spectra can be separated by in order to be considered potentially generated by the same peptide, for measurement error estimation. Default =1000
.--pm-min-peak-pairs <integer>
– Minimum number of peak pairs (for precursor or fragment) that must be successfully paired in order to attempt to estimate measurement error distribution. Default =100
.
-
Input and output
--fileroot <string>
– The fileroot string will be added as a prefix to all output file names. Default =<empty>
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.--output_sqtfile <integer>
– 0=no, 1=yes write sqt file. Default =0
.--output_txtfile <integer>
– 0=no, 1=yes write tab-delimited text file. Default =1
.--output_pepxmlfile <integer>
– 0=no, 1=yes write pep.xml file. Default =1
.--output_percolatorfile <integer>
– 0=no, 1=yes write percolator file. Default =0
.--output_outfiles <integer>
– 0=no, 1=yes write .out files. Default =0
.--print_expect_score <integer>
– 0=no, 1=yes to replace Sp with expect in out & sqt. Default =1
.--num_output_lines <integer>
– num peptide results to show. Default =5
.--show_fragment_ions <integer>
– 0=no, 1=yes for out files only. Default =0
.--sample_enzyme_number <integer>
– Sample enzyme which is possibly different than the one applied to the search. Used to calculate NTT & NMC in pepXML output. Default =1
.