Metapeptides: Databases for metaproteomics from six-frame translations of shotgun metagenomics

This page describes resources for building databases of "metapeptides": protein fragments derived from shotgun metagenomics reads by six-frame translation. These databases can be used for searching LC-MS/MS spectra from acquisitions on metaproteomics samples.

This website accompanies the manuscript An Alignment-Free “Metapeptide” Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing published in Journal of Proteome Research in July 2016. That manuscript describes the creation of metapeptide databases describing two samples (Bering Strait [BSt] and Chukchi Sea [CS]) and the use of those databases to search LC-MS/MS spectra from the samples. In addition to pointers to the Sixgill software that generates metapeptide databases, the shotgun metagenomic reads, LC-MS/MS spectra and metapeptide databases described in the manuscript can be accessed here.

The Sixgill software package

The Sixgill software package provides tools for creating and managing metapeptide databases. The Sixgill GitHub repository has all necessary software and documentation for running Sixgill to generate databases of metapeptides. The databases described in our manuscript were created from the shotgun metagenomic sequencing reads available below, using default parameters.

Sixgill is also available through a Galaxy interface.

Shotgun metagenomic reads

The Shotgun metagenomic reads described in the manuscript and used to build the BSt and CS metapeptide databases have been deposited in the NCBI Sequence Read Archive (SRA) with accession SRP071900. Direct links to BioSample database:

Here is a small shotgun sequencing file (25,000 reads extracted from the BSt sequencing file) that can be used for building test metapeptide databases using the Sixgill software.

LC-MS/MS spectra

The LC-MS/MS spectra from triplicate acquisitions of peptides from the BSt and CS samples are available from Chorus, with project ID 587. The BSt replicates are acquisitions 51-53, and the CS replicates are acquisitions 45-47. The files are also available directly from this website, in this directory

FASTA databases

Comet search results

Concatenated Comet search results (comet.params parameter file) from all three replicates of each sample, formatted for use by Percolator, are available below. File sizes are 50-70MB.
DatabaseBStCS
environmental comet_BSt_environmental.pin comet_CS_environmental.pin
metagenome comet_BSt_metagenome.pin comet_CS_metagenome.pin
metapeptides comet_BSt_metapeptides.pin comet_CS_metapeptides.pin