Crux is a mass spectrometry analysis toolkit developed by the Noble and MacCoss labs. Kaipo Tamura developed crux-pipeline to package up the functions of Crux in a way that is easy to use for members of the Department of Genome Sciences. Here are steps:
If you have questions about crux-pipeline, you can contact Kaipo (kaipot@uw.edu). General questions about Crux can be directed to crux-users@googlegroups.com
For now, users must convert their .raw files to mzML, mzXML, ms2, or cms2. This can be done using, for example, ProteoWizard. We are working on a way to get the conversion done as part of crux-pipeline.
Connect to grid.gs.washington.edu via ssh.
The pipeline script then can be called by running "crux-pipeline". Running "crux-pipeline --help" will print a usage statement with a list of options. See options at the bottom of the file.
Basic usage is:
crux-pipeline [options] <spectrum file>+ <FASTA file> <parameter file>
where spectrum files are in mzML, mzXML, ms2, or cms2 format. For example:
crux-pipeline --msdapl-id 1000 --output-dir my_search example1.ms2 example2.ms2 example.fasta example.params
NOTE: you can use *.cms2 instead of listing all your files to run. Also if you are running Hardklor+Bullseye, crux knows to look for the cms1 files in the same location as the cms2 files.
You can also use predefined FASTA and/or parameter files. To see a list of available ones, run "crux-pipeline --list-fasta" or "crux-pipeline --list-param":
$ crux-pipeline --list-param
"high-low" -> /net/maccoss/vol2/software/bin/crux-pipeline-files/param/jarrett_high_low_params.txt
"low-low" -> /net/maccoss/vol2/software/bin/crux-pipeline-files/param/jarrett_low_low_params.txt
"high-high" -> /net/maccoss/vol2/software/bin/crux-pipeline-files/param/jarrett_high_high.params.txt
This shows that you may enter "high-low", "low-low", or "high-high" instead of the path to a parameter file. These premade parameter files can also be useful if you want to copy them from their listed paths as a starting point and edit them for yourself.
NOTE: Default memory is set to 4.0G for each bullseye/comet and 8G for percolator. You may need to request more memory for percolator if you have many files (e.g., "--percolator-mem 12.0G").
Once you have started a run, the output files will go into the specified output directory (output_<timestamp> by default).
You can check the status of the run with: crux-pipeline -s <output directory>
Or cancel all jobs in a run with: crux-pipeline -c <output directory>
Once the run is complete, there will be a subdirectory in the output directory called "crux-output" containing the search results in sqt format, and (if it was run) the Percolator results in a file called combined-results.perc.xml. The results will also be uploaded into MSDaPl if the "msdapl-id" option was specified.
Usage: crux-pipeline [options] <spectrum file>+ <FASTA file> <parameter file> Commands: --help (-h) - Displays this message. --version (-v) - Displays the version number of this script. --status (-s) <dir> - Displays the status of jobs for a directory. --cancel (-c) <dir> - Cancels jobs for a directory. --list-fasta - Displays a list of FASTA keywords available for this script. --list-param - Displays a list of parameter file keywords available for this script. Options: --crux-path <path> - Specify the path to the Crux executable to be used. --bullseye (-b) <T|F> - Run Hardklor and Bullseye. Default T --percolator (-p) <T|F> - Run Percolator. Must be true for loading results into MSDaPl. Default T --search-engine (-e) <comet|tide> - The search engine to run. Default comet --msdapl-id (-m) <id> - The number of the MSDaPl project to load final results into. Default none --msdapl-name (-n) <name> - The submitter's username for MSDaPl. Default none --msdapl-species (-x) <id> - The taxonomy ID of the target species for MSDaPl. Default none --msdapl-instrument (-i) <instrument> - The name of the instrument used to acquire data for MSDaPl. Default none --msdapl-comment (-t) <comment> - Comments to be used when uploading data to MSDaPl. Default none --output-dir (-o) <dir> - The directory where results files will be outputted. Default output_<timestamp> --bullseye-mem <value>, --comet-mem <value>, --tide-index-mem <value>, --tide-search-mem <value>, --percolator-mem <value>, --msdapl-mem <value> - How much memory to request for a command's job. Default 2.0G Command expected runtime: --bullseye-rt=<value> --comet-rt=<value> --tide-index-rt=<value> --tide-search-rt=<value> --percolator-rt=<value> --msdapl-rt=<value>