Higher-order functional domains in the human ENCODE regions

This webpage is intended as a supplement to the manuscript, "Higher-order functional domains in the human ENCODE regions," by R.E. Thurman, N. Day, W.S. Noble, and J.A. Stamatoyannopoulos, Genome Research, 2007, 17, 991-994. Below find data, links to data, and analysis results referenced in that paper. There are also some results here included in the ENCODE Nature paper, "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project", Nature, 2007, 447, 799-816 (see 6-track segmentation below).
As part of the ENCODE project we have access to a number of datatypes, representing a variety of functional variables (e.g., DNaseI sensitivity, transcription, histone modifications, conservation, etc.) sampled in a nearly continuous fashion across the genome. We would like to understand to what degree such data reveal coherent higher-order features that may in turn illuminate the underlying functional architecture of the genome. To address this, we aim to develop approaches based on wavelet analysis for the discovery of "domain-level" behavior in fine scale data, and for correlating these apparently disparate functional data types.

The rough outline of methods used in the paper are as follows:

(postscript version of this diagram)


Annotation results


Segmentation comparisons

Concordance figures are the percentage of bases whose state assignments agree in both segmentations.
  1. Single-track comparisons

  2. Affy RNA Signal UCSD Chip H3K27me3 MSA non-exonic Uva TR50
    Sanger H3K4me1 67% concordance 62% concordance 56% concordance 74% concordance
    Sanger H3K4me2 72% concordance 55% concordance 58% concordance 69% concordance
    Sanger H3K4me3 68% concordance 52% concordance 53% concordance 60% concordance
    Sanger H3ac 75% concordance 59% concordance 63% concordance 70% concordance
    Sanger H4ac 74% concordance 63% concordance 61% concordance 75% concordance
    Affy RNA Signal 49% concordance 57% concordance 61% concordance
    UCSD Chip H3K27me3 54% concordance 70% concordance
    MSA non-exonic 57% concordance
  3. Sanger histone modifications vs. themselves

  4. H3ac H4ac H3K4me1 H3K4me2 H3K4me3
    H4ac 85% concordance
    H3K4me1 73% concordance 80% concordance
    H3K4me2 86% concordance 80% concordance 77% concordance
    H3K4me3 80% concordance 73% concordance 68% concordance 87% concordance
  5. Tissue differences: HeLa vs. GM

    1. Sanger H3K4me2, 64kb smooth 76% concordance
    2. Sanger H3K4me1, 64kb smooth 69% concordance
    3. Sanger H3K4me3, 64kb smooth 78% concordance
    4. Sanger H3ac, 64kb smooth 72% concordance
    5. Sanger H4ac, 64kb smooth 76% concordance
    6. Affy RNA Signal, 51.2kb smooth 81% concordance
  6. Wavelet scale differences

    1. SangerH3K4me2, GM, 2kb smooth vs. 64kb smooth 70% concordance
    2. SangerH3K4me2, segment size as a function of scale (postscript file)
    3. Segment boundary scale sensitivity, based on 4-track (Affy zero-threshold, H3ac, H3K27me3, and TR50) segmentations at 32kb, 64kb, and 128kb scales.
  7. Affy/H3ac/H3K27me3/TR50 wavelet 4-track vs. each of its component single-track results

    1. Sanger H3ac 64kb (89% concordance)
    2. Affy RNA, 51.2kb (80% concordance)
    3. UCSD Chip, H3K27me3 64kb (62% concordance)
    4. TR50 (76% concordance).
  8. Affy/H3ac/H3K27me3/MSA non-exonic/TR50 wavelet 5-track vs. each of its component single-track results

    1. Sanger H3ac 64kb (90% concordance).
    2. Affy RNA, 51.2kb (79% concordance).
    3. UCSD Chip, H3K27me3 64kb (62% concordance).
    4. TR50 (76% concordance).
    5. MSA non-exonic (62% concordance)
  9. 3-track vs. 4-track

    1. H3ac/H3K27me3/Tr50 vs Affy/H3ac/H3K27me3/MSA/Tr50, 78% concordance
  10. 4-track vs. 4-track

    1. Affy/H3ac/H3K27me3/Tr50 vs. H3ac/H3K27me3/MSA/Tr50, 78% concordance
  11. 4-track vs. 5-track

    1. Affy/H3ac/H3K27me3/Tr50 vs Affy/H3ac/H3K27me3/MSA/Tr50 (wavelet-smoothed, zero Affy threshold) 98% concordance

High-confidence segments

  1. 4 track: Affy (zero threshold), H3ac, H3K27me3, TR50 (wavelet)
  2. 5 track: Affy (zero threshold), H3ac, H3K27me3, MSA non-exonic, TR50 (wavelet)
  3. Affy, H3ac, H3K27me3, TR50 - single-track segmentations only