Predicting the in vivo signature of human gene regulatory sequences

William Stafford Noble, Scott Kuehn, Robert Thurman, Richard Humbert, James C. Wallace, Mann Yu, Michael Hawrylycz and John Stamatoyannopoulos

Bioinformatics (Proceedings of the Intelligent Systems for Molecular Biology Conference). 21(Suppl 1):i338-i343, 2005.


In the living cell nucleus, genomic DNA is packaged into chromatin. DNA sequences that regulate transcription and other chromosomal processes are associated with local disruptions, or "openings," in chromatin structure caused by the cooperative action of regulatory proteins. Such perturbations are extremely specific for cis-regulatory elements, and occur over short stretches of DNA (typically approximately 250 bp). They can be detected experimentally as DNaseI hypersensitive sites (HSs) in vivo, though the process is extremely laborious and costly. The ability to discriminate DNaseI HSs computationally would have a major impact on the annotation and utilization of the human genome.

We found that a supervised pattern recognition algorithm, trained using a set of 280 DNaseI HS and 737 non-HS control sequences from erythroid cells, was capable of de novo prediction of HSs across the human genome with surprisingly high accuracy determined by prospective in vivo validation. Systematic application of this computational approach will greatly facilitate discovery and analysis of functional non-coding elements in the human and other complex genomes.

Supplementary data