Buske OJ, Hoffman MM, Ponts N, Le Roch KG, Noble WS. Oct 2011.
Exploratory analysis of genomic segmentations with Segtools.
BMC Bioinformatics, 12:415; doi:10.1186/1471-2105-12-415
Segtools is a Python package for analyzing genomic segmentations. The software efficiently calculates a variety of summary statistics and produces corresponding publication quality visualizations. The overall goal of Segtools is to provide a bird’s-eye view of complex genomic data sets, allowing researchers to easily generate and confirm hypotheses.
Segmentations should be in BED4+ or GFF format, with the 'name' field of each line used specifying the segment label of that line. The Segtools commands allow you to compare the properties of the segment labels with one another.
Segtools requires the following prerequisites:
Python 2.5.1-2.7
, with packages: RPy2 2.1.3+
and Genomedata
(optional)Numpy 1.3+
R 2.10.0+
with packages: latticeExtra
and reshape
Once these prerequisites are properly installed, install Segtools with:
easy_install segtools
To upgrade an existing Segtools installation to the latest version, type the following command at the shell prompt:
easy_install -U segtools
To help install these prerequisites and Segtools on Linux/UNIX* systems, we developed a simple, interactive script. To download and run the script:
wget http://noble.gs.washington.edu/proj/segtools/install.py python install.py
We are constantly trying to improve the installation script, so please let use know if you run into any trouble using it.
* We have only tested this software on the following platforms. We would love to extend our support to other systems in the future, and we would gladly accept any contributions toward this end.
As a last resort, or for situation in which you want to try Segtools without dealing with the hassles of heterogeneous system configurations, we have created a VirtualBox Virtual Machine, loaded with Segtools, some sample data, and all necessary prerequisites.
Warning, this is a large (551 MB) file: VM for Segtools 1.1.6
The application's documentation is available in two formats:
To stay informed of new releases and other important information, please subscribe to the segtools-announce mailing list.
There is also a segtools-users mailing list for general discussion and questions about the use of Segtools.
If you want to report a bug or request a feature, please do so using the Segtools issue tracker.
For other support with Segtools, or to provide feedback, please e-mail Michael. We are interested in all comments regarding the package and the ease of use of installation and documentation.
* segtools-signal-distribution: cleaned up code and arguments to only support accurate computation of statistics (and not histogram approximation) * segtools-signal-distribution: rewrote calculation loop to improve speed (now takes ~2hrs for chr1 with 93 tracks) * progress bars now include ETA.
* segtools-nucleotide-transition: significant (several-hundred-fold) speedup * segtools-transition: added R transcript * segtools-aggregation: added R transcript * segtools-overlap: Fixed bug in argument parsing that caused R plotting to fail * docs: Added high-level structured summary of the output of each command * requirements: Genomedata package now only required to use segtools-nucleotide-transition and segtools-signal-distribution, not for unrelated commands * docs: Unified usage terminology to use "annotation" and "feature" (instead of "annotations" and "entries", for example) * segtools-*: Added -R option to allow command-line specification of R options to segtools commands that plot using R.
* docs: automtically add --help output to every command * __init__.py: add gzipped pickles * __init__.py: _from_pickle: fix UnpickleError message * segtools-overlap: add R transcript * segtools-overlap: add --max-contrast option * segtools-length-distribution: allows more generic ANNOTATIONS as input * segtools-length-distribution: added --no-segments and --no-bases flags to control display on size summary plot * segtools-nucleotide-frequency: improved speed by caching whole chromosome sequence * segtools-relabel: added command to relabel a segmentation * segtools-feature-distance: added histogram visualization output * segtools-preprocess: if OUTFILE is specified, the .pkl.gz extension is still added * common.R: fix a comment character-related bug
* aggregation.R: fixed syntax error that caused segtools-aggregation to fail
* common.R: print.image: create the filepath's parent directory, if it doesn't already exist * fix bug related to: allow comment character of # in mnemonics files, and automatically add a comment to generated mnemonic files * segtools-gmtk-parameters: fix bug related to: doesn't generate hierarchical mnemonics when Segway subseg is used but has cardinality 1
* allow comment character of # in mnemonics files, and automatically add a comment to generated mnemonic files * segtools-html-report: fix some problem with os.path.samefile() (maybe related to Python 2.7+) * segtools-gmtk-parameters: doesn't generate hierarchical mnemonics when Segway subseg is used but has cardinality 1 * add requirement of numpy>=1.3 (because histogram semantics change in that version)
* segtools-signal-distribution: added --order-tracks and --order-labels options * install-script: now sets R_HOME enviroment variable which fixes some issues * docs: fixed dead mnemonic file reference * bugfix: removed 'new' argument to 'histrogram' for compatibility with newer versions of numpy * bugfix: segtools-html-report no longer crashes when the mnemonic file is already in the place where it would be copied * bugfix: fixed segtools-flatten unpacking error (when --filter option was not specified)
* Docs: added workflow flowchart * segtools-flatten: added --filter option * Bug fix: segtools-flatten now works when some files specify strand information and others don't. * Install script: now searches for R a little harder, program versions are printed when found, and more errors are caught. * eliminated exclamation marks * new autogen stuff from Sphinx to allow man page generation
* Plotting: No longer plots to screen. This makes plotting cleaner, less error-prone, but slightly slower. * Documentation: Updated to include command syntax
* Filled in large holes in documentation * Improved robustness of installation script * Renamed many of the Segtools commands for simplicity * Made BED/GFF files interchangable for most arguments * Added ability to pre-process segmentations with segtools-preprocess * Made aggregation significance non-default (since it is not yet mathematically sound). * Cleaned up command-line interfaces * Sped up aggregation