Metapeptides: Databases for metaproteomics from six-frame translations of shotgun metagenomics
This page describes resources for building databases of "metapeptides": protein fragments derived from shotgun metagenomics reads by six-frame translation. These databases can be used for searching LC-MS/MS spectra from acquisitions on metaproteomics samples.
This website accompanies the manuscript An Alignment-Free “Metapeptide” Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing published in Journal of Proteome Research in July 2016. That manuscript describes the creation of metapeptide databases describing two samples (Bering Strait [BSt] and Chukchi Sea [CS]) and the use of those databases to search LC-MS/MS spectra from the samples. In addition to pointers to the Sixgill software that generates metapeptide databases, the shotgun metagenomic reads, LC-MS/MS spectra and metapeptide databases described in the manuscript can be accessed here.
The Sixgill software package
The Sixgill software package provides tools for creating and managing metapeptide databases. The Sixgill GitHub repository has all necessary software and documentation for running Sixgill to generate databases of metapeptides. The databases described in our manuscript were created from the shotgun metagenomic sequencing reads available below, using default parameters.
Sixgill is also available through a Galaxy interface.
Shotgun metagenomic reads
The Shotgun metagenomic reads described in the manuscript and used to build the BSt and CS metapeptide databases have been deposited in the NCBI Sequence Read Archive (SRA) with accession SRP071900. Direct links to BioSample database:- BSt: SAMN04562491
- CS: SAMN04562492
Here is a small shotgun sequencing file (25,000 reads extracted from the BSt sequencing file) that can be used for building test metapeptide databases using the Sixgill software.
LC-MS/MS spectra
The LC-MS/MS spectra from triplicate acquisitions of peptides from the BSt and CS samples are available from Chorus, with project ID 587. The BSt replicates are acquisitions 51-53, and the CS replicates are acquisitions 45-47. The files are also available directly from this website, in this directoryFASTA databases
- env_nr: (~1.9GB) The latest version of the NCBI env_nr database of proteins from large environmental sequencing projects
- metagenome: (143MB) The database of predicted genes from the metagenoome assembled from BSt and CS samples, described in the manuscript
- metapeptides: The metapeptide databases constructed from shotgun metagenomic sequencing of the BSt and CS samples are here:
Comet search results
Concatenated Comet search results (comet.params parameter file) from all three replicates of each sample, formatted for use by Percolator, are available below. File sizes are 50-70MB.| Database | BSt | CS | 
|---|---|---|
| environmental | comet_BSt_environmental.pin | comet_CS_environmental.pin | 
| metagenome | comet_BSt_metagenome.pin | comet_CS_metagenome.pin | 
| metapeptides | comet_BSt_metapeptides.pin | comet_CS_metapeptides.pin | 
