This is a guide to downloading and using Fido, a method for protein identification in MS/MS proteomics. Think of it like a protein delivery dog-- you bring it the scored matches between peptides and spectra, and it fetches a list of proteins ranked by posterior probability by doing clever tricks...
Fido: Protein Delivery Dog
- Running Fido
- Input Format
- Output Format
- Reformatting PepXML (PeptideProphet output)
- Downloading Fido
- How to Cite Fido
- Is Fido Free?
Please note that the software is not yet of release quality; some of the organization was just pieced together for demonstration, and may make you think, "Why... Why?!"
Running Fido (from the command line)
To run Fido, type
bin/Fido psm_graph_file prior_probability alpha_probability beta_probabilityor
bin/Fido psm_graph_file prior_probability alpha_probability beta_probability log2_maximum_number_of_states
where psm_graph_file is the path to your peptide spectrum match (PSM) graph file, and prior_probability, alpha_probability, and beta_probability are values for gamma (the protein prior probability), alpha (the peptide emission probability), and beta (the spurious peptide identification probability), respectively. Reasonable choices for these parameters (both by accuracy and calibration) are gamma = 0.5, alpha = 0.1, and beta = 0.01, but for now if you want to choose the absolute best parameters for your data, then you should look at the the parameters that give you the best target-decoy performance (which would require running Fido for each parameter set you want to check).
The optional parameter log2_maximum_number_of_states will be used to split up problems that are intractable to solve exactly; the smaller it is set, the faster it will be, but at the cost of accuracy. If 4 is used, then it will ensure that every connected subgraph only takes 16 or fewer steps to marginalize; if 10 is used, then it will ensure every connected subgraph takes 1024 or fewer steps to marginalize.
Input File Format
The format for the PSM graph file is this:e peptide_string
r protein that would create this peptide using theoretical digest
r second protein
...
r final protein
p probability of the peptide match to the spectrum (given by PeptideProphet or comparable).
e peptide_string
r protein that would create this peptide using theoretical digest
r second protein
...
r final protein
p probability of the peptide match to the spectrum (given by PeptideProphet or comparable).
...
For example, this graph

would correspond to this PSM graph file:
e EEEMPEPK r SW:TRP6_HUMAN r GP:AJ271067_1 r GP:AJ271068_1 p 0.9849 e LLEIIQVR r SW:TRP6_HUMAN r GP:AJ271067_1 r GP:AJ271068_1 p 0.0 e FAFNNKPNLEWNWK r gi|1574458|gb|AAC23247.1| p 0.9750
Output File Format
The ouput of Fido is a column of posteriors (sorted in descending order) and the proteins that they correspond to:
0.9988 { gi|1574458|gb|AAC23247 }
0.6788 { SW:TRP6_HUMAN , GP:AJ271067_1 , GP:AJ271068_1 }
where all proteins on the line get the same score. The immediately
above output is equivalent to this:
0.9988 { gi|1574458|gb|AAC23247 }
0.6788 { SW:TRP6_HUMAN }
0.6788 { GP:AJ271067_1 }
0.6788 { GP:AJ271068_1 }
It should be noted that not all proteins receiving the same score will necessarily be put on the same line. This is extremely important for computing ROC curves correctly when some proteins receive the same posterior.
Reformatting PepXML (PeptideProphet output)
To produce the input file format from a pepXML file, interact.xml run:
xsltproc src/xsl/pepProph2Pivdo.xsl input.xml
The output will be the properly formatted PSM graph file. Problems using this xsl code are (thus far) namespace related.
Downloading and Building Fido
Fido has been compiled under g++ 4.x, but with older versions of g++, it may need a minimal amount of tweaking.
Use this URL to download Fido (C++):
noble.gs.washington.edu/proj/fido/
|
noble.gs.washington.edu/proj/fido/
|
