Segway semi-automated genomic annotation

Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes J, Noble WS. 2012. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 9:473–476. doi:10.1038/nmeth.1937. PubMed Central (free version): PMC3340533 (BibTeX)
Hoffman MM*, Ernst J*, Steven WP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E, Hardison RC, Dunham I, Kellis M, Noble WS. 2012. Integrative annotation of chromatin elements from ENCODE data. NAR 41:827-841 doi: (BibTeX)

The free Segway software package contains a novel method for analyzing multiple tracks of functional genomics data. Our method uses a dynamic Bayesian network (DBN) model, which enables it to analyze the entire genome at 1-bp resolution even in the face of heterogeneous patterns of missing data. This method is the first application of DBN techniques to genome-scale data and the first genomic segmentation method designed for use with the maximum resolution data available from ChIP-seq experiments without downsampling. Our software has extensive documentation and was designed from the outset with external users in mind. Researchers at other universities and institutes have already installed and used Segway for their own projects.

Segmentations

Human chromatin structure

View the segmentation from our Nature Methods paper, "Unsupervised pattern discovery in human chromatin structure through genomic segmentation," in the UCSC Genome Browser. NCBI36 (hg18). GRCh37 (hg19). Here is a brief description of the various classes of segment labels

Download the segmentation for further analysis. NCBI36 (hg18). GRCh37 (hg19). (~165 MB, gzipped BED). Here are the mnemonic assignments (tab-delimited).

Integrative annotation of chromatin elements

View the segmentation from our Nucleic Acids Research paper, "Integrative annotation of chromatin elements from ENCODE data," in the UCSC Genome Browser: hg19 only. These segmentations are already relabeled so it is not necessary to use a mnemonic assignment file.

Segmentation downloads (hg19)

Documentation

Read the documentation, which begins with a quick start. The documentation is also available as a PDF.

Installation

The easy way to install segway and its prerequisites, and set up your environment properly to use them is to use our interactive install script. Just type this command from bash on your UNIX system:

python <(wget -O - http://noble.gs.washington.edu/proj/segway/install.py)

Segway requires the use of a cluster management system. Currently, we support Sun Grid Engine/Oracle Grid Engine/Open Grid Scheduler and Platform LSF. If you would like to use Segway on another system, please open a ticket in the issue tracker. You can also run Segway on SGE via the Amazon EC2 compute cloud.

Segway is only supported on Linux. Specifically, this means it is not supported on other operating systems such as Mac OS X.

Support

For support of Segway, please write to the segway-users mailing list, rather than writing the authors directly. Using the mailing list will get your question answered more quickly. It also allows us to pool knowledge and reduce getting the same inquiries over and over. Questions sent to the mailing list will receive a higher priority than those sent to us individually.

Specifically, if you want to report a bug or request a feature, please do so using the Segway issue tracker. We are interested in all comments on the package, and the ease of use of installation and documentation.

If you do not want to read discussions about other people's use of Segway, but would like to hear about new releases and other important information, please subscribe to the segway-announce mailing list. Announcements of this nature are sent to both segway-users and segway-announce.

Useful links

Running Segway in the Amazon Compute Cloud by Jay Hesselberth, University of Colorado Denver

Source code

Version 1.1.0

Notes on the segmentation

The underlying signal data for the segmentation presented above is available in bedGraph and bigWig formats (NCBI36/hg18). Use this browser file to load all the bigWigs. We produced these signal files using Wiggler from original data available from the Encode DCC.

We produced the original segmentations for NCBI36. We used liftOver (minMatch=0.99) to convert segmentations to GRCh37, and then filtered out any overlapping regions.

  Michael Hoffman < >