Crux is a mass spectrometry analysis toolkit developed by the Noble and MacCoss labs. Kaipo Tamura developed crux-pipeline to package up the functions of Crux in a way that is easy to use for members of the Department of Genome Sciences. Here are steps:
If you have questions about crux-pipeline, you can contact Kaipo ( General questions about Crux can be directed to
For now, users must convert their .raw files to mzML, mzXML, ms2, or cms2. This can be done using, for example, ProteoWizard. We are working on a way to get the conversion done as part of crux-pipeline.
Connect to via ssh.
The pipeline script then can be called by running "crux-pipeline". Running "crux-pipeline --help" will print a usage statement with a list of options. See options at the bottom of the file.
Basic usage is:
crux-pipeline [options] <spectrum file>+ <FASTA file> <parameter file>
where spectrum files are in mzML, mzXML, ms2, or cms2 format. For example:
crux-pipeline --msdapl-id 1000 --output-dir my_search example1.ms2 example2.ms2 example.fasta example.params
NOTE: you can use *.cms2 instead of listing all your files to run. Also if you are running Hardklor+Bullseye, crux knows to look for the cms1 files in the same location as the cms2 files.
You can also use predefined FASTA and/or parameter files. To see a list of available ones, run "crux-pipeline --list-fasta" or "crux-pipeline --list-param":
$ crux-pipeline --list-param
"high-low" -> /net/maccoss/vol2/software/bin/crux-pipeline-files/param/jarrett_high_low_params.txt
"low-low" -> /net/maccoss/vol2/software/bin/crux-pipeline-files/param/jarrett_low_low_params.txt
"high-high" -> /net/maccoss/vol2/software/bin/crux-pipeline-files/param/jarrett_high_high.params.txt
This shows that you may enter "high-low", "low-low", or "high-high" instead of the path to a parameter file. These premade parameter files can also be useful if you want to copy them from their listed paths as a starting point and edit them for yourself.
NOTE: Default memory is set to 4.0G for each bullseye/comet and 8G for percolator. You may need to request more memory for percolator if you have many files (e.g., "--percolator-mem 12.0G").
Once you have started a run, the output files will go into the specified output directory (output_<timestamp> by default).
You can check the status of the run with: crux-pipeline -s <output directory>
Or cancel all jobs in a run with: crux-pipeline -c <output directory>
Once the run is complete, there will be a subdirectory in the output directory called "crux-output" containing the search results in sqt format, and (if it was run) the Percolator results in a file called combined-results.perc.xml. The results will also be uploaded into MSDaPl if the "msdapl-id" option was specified.
Usage: crux-pipeline [options] <spectrum file>+ <FASTA file> <parameter file> Commands: --help (-h) - Displays this message. --version (-v) - Displays the version number of this script. --status (-s) <dir> - Displays the status of jobs for a directory. --cancel (-c) <dir> - Cancels jobs for a directory. --list-fasta - Displays a list of FASTA keywords available for this script. --list-param - Displays a list of parameter file keywords available for this script. Options: --crux-path <path> - Specify the path to the Crux executable to be used. --bullseye (-b) <T|F> - Run Hardklor and Bullseye. Default T --percolator (-p) <T|F> - Run Percolator. Must be true for loading results into MSDaPl. Default T --search-engine (-e) <comet|tide> - The search engine to run. Default comet --msdapl-id (-m) <id> - The number of the MSDaPl project to load final results into. Default none --msdapl-name (-n) <name> - The submitter's username for MSDaPl. Default none --msdapl-species (-x) <id> - The taxonomy ID of the target species for MSDaPl. Default none --msdapl-instrument (-i) <instrument> - The name of the instrument used to acquire data for MSDaPl. Default none --msdapl-comment (-t) <comment> - Comments to be used when uploading data to MSDaPl. Default none --output-dir (-o) <dir> - The directory where results files will be outputted. Default output_<timestamp> --bullseye-mem <value>, --comet-mem <value>, --tide-index-mem <value>, --tide-search-mem <value>, --percolator-mem <value>, --msdapl-mem <value> - How much memory to request for a command's job. Default 2.0G Command expected runtime: --bullseye-rt=<value> --comet-rt=<value> --tide-index-rt=<value> --tide-search-rt=<value> --percolator-rt=<value> --msdapl-rt=<value>