Multi-scale correlations in continuous genomic data

Robert E. Thurman, William Stafford Noble and John A. Stamatoyannopoulos

Proceedings of the Pacific Symposium on Biocomputing, 2008. pp. 201-215.


Functional genomic quantities such as histone modifications, chromatin accessibility, and evolutionary constraint can now be measured in a nearly continuous fashion across the genome. The genome is highly heterogeneous, and the relationships between different functional annotations may be fluid. Here we present an approach for visualizing, quantifying, and determining the statistical significance of local and regional correlations between high-density continuous genomic datasets. We use wavelets to generate a multi-scale view of each component data set and calculate correlations between data types as a function of genome position over a continuous range of scales in sliding window fashion. We determine the statistical significance of correlations using a non-parametric sampling approach. We apply the wavelet correlation method to histone modification and chromatin accessibility (DNasel sensitivity) data from the NHGRI ENCODE project. We show that DNaseI sensitivity is broadly correlated (though to differing degrees) with a number of different activating histone modifications. We examine the continuous relationship between the repressive histone modification H3K27me3 and the activating mark H3K4me2, and find these modifications to display significant duality, with both significant positively and negatively correlated genomic territories. While the former appear to recapitulate in definitive cells the so-called "bi-valent" pattern originally proposed as a signature of pluripotency, the presence of negatively correlated regions suggests that the regulatory events that underlie the observed modification patterns are complex and highly regionalized in the genome.