Customization and Search Options

Crux allows the user the flexibility to change many of the search and analysis parameters. Attributes like the format of output and which peptides are selected from the protein database are controlled through the numerous options. This page will start with some general information about options and then describe the use of some key crux options.

Introduction to options

A crux command is made up of four parts: executable name, sub-command, required arguments, and options. Let's use a crux search-for-matches command as an example. Here is the general form.

$ crux search-for-matches [options] <ms2 filename> <protein filename>

In this case, the sub-command is search-for-matches. The required arguments are given inside of angle brackets, <>, and are the name of the input ms2 file and the name of the input protein fasta file. Options are just that, optional instructions. They are always placed between the executable name and the arguments. Any number of options can be included separated by spaces.

All of the available options are described for each executable on the documentation pages. You can also get a list of available options by running a command with just the name and no arguments.

$ crux search-for-matches

You should see output that looks like this.

Error in command line. Error # 5
the required argument <ms2 file> is missing.
Usage: crux search-for-matches [options]  
  [--verbosity <int>] Set level of output to stderr (0-100).  Default 30.
  [--version <string>] Print version number and quite
  [--parameter-file <string>] Set additional options with values in the given file. Default to use only command line options and default values. 
  [--overwrite <string> Replace existing files (T) or exit if attempting to overwrite (F). Default F.
...

The first three lines are telling you that you forgot the required arguments and are reminding you what they are. The following lines list all the options (only four of which are shown above). Crux options all begin with two dashes, --, followed by the name (with no space in between). The name is followed by a space and the appropriate argument. This example increases the verbosity to 40

$ crux search-for-matches --verbosity 40 sample.ms2 yeast.fasta

Parameter files

The second option listed above, --parameter-file, is available for all crux programs. It allows options to be specified in a file. All of the command line options as well as some extras can be put in a parameter file. The format is slightly different. The two dashes are removed from the option name and the name and value are separated by an equal sign instead of a space.

<option name>=<option argument (value)>

The above example where we changed the verbosity would look like this in a parameter file

verbosity=40

The parameter file requires only one option per line and allows comments on lines beginning with '#'. More information and a sample parameter file can be found here. Command line and parameter file options may be used separately or together. If an option is specified in both, the value on the command line will be used.

A file containing the name and value of all parameters/options for the current operation will automatically be saved in the output directory. Note that not all parameters in the file may have been used in the operation. The parameter file will be named <tag>.params.txt, where <tag> is search, qvalues or percolator, depending on which Crux command was used. The resulting file can be used with the --parameter-file option for other crux programs.

Now that we've covered the general form of options, let's discuss some specific options for crux , beginning with output.

Controlling the output

The option --overwrite decides if output files can be replaced if they already exist (value T) or if the program should not overwrite an existing file (value F). In the later case, the search will fail if a file of the same name already exists.

Changing the peptide parameters

Several parameters control which peptides in the database are used in the search. By setting, for example, --min-length 6 and --max-length 50 you confine the search to only those peptides with at least 6 amino acids and no more than 50. Similarly --min-mass and --max-mass set limits on peptides based on their calculated masses. The masses can be calculated as mono-isotopic or average with the --isotopic-mass option and the values mono or average.

NOTE: Options for enzyme specificity and peptide length and mass limits must correspond with those used with crux-create-index, if an index is being used with the search.

Many protein samples are digested with trypsin and the peptides searched can reflect this. Use the option --cleavages with the value tryptic to search only peptides with trypsin-specific cleavage sites at the C and N termini, partial to search peptides with at least one tryptic site, all to use no enzyme specificity. Use the --missed-cleavages T option to allow tryptic cleavage sites within the peptide sequence (F to require complete digestion).

Changing the search parameters

The database search involves two scoring steps, a preliminary score to find the best candidates and a main scoring function to further narrow those results. By default, the top-scoring 500 matches are scored by the main scoring function, but this number can be adjusted with the max-rank-preliminary option.

There are two methods for estimating p-values for the search result. With the option --compute-p-values T, they are calculted by estimating the parameters of the score distributions for each spectrum. (see this paper as an example). Or the statistical significance of the search scores can be calculated in a follow up step by using the scores from searches to a decoy database. search-for-matches will search up to 2 shuffled versions of the database with the option --number-decoy-set. The default number is 2.


Crux home