The Genomedata format for storing large-scale functional genomics data

Michael M. Hoffman, Orion J. Buske and William Stafford Noble

Bioinformatics. 26(11):1458-1459, 2010.


We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files.

Reference implementation in Python and C components are available at under the GNU General Public License.