Supplementary data for "Predicting human nucleosome occupancy from primary sequence"
Shobhit Gupta, Jonathan Dennis, Robert E. Thurman, Robert Kingston, John A. Stamatoyannopoulos and William Stafford Noble
PLoS Computational Biology. 4(8):e10000134, 2008.
- Microarray data from Dennis et al. 
- Raw data: Experiments were perfomed on three arrays (Array 3, Array 4 and Array 6). Each locus is represented six times on each array, three times in the forward orientation and three times in the backword orientation. Two files are provided for each array, one containing the nucleosomal intensities and the other containing genomic (background) intensities.
- Weakly smoothed data: For each array, the smoothed data file contains both nucleosomal and genomic intensities.
- Strongly smoothed data: Similar to the weakly smoothed data, each file contains both nucleosomal and genomic intensities.
- SVM training sets. Each file is a FASTA file containing the sequences of the probes used to train the corresponding SVM. Chromosomal coordinates in NCBI Build 35 (May 2004 assembly) are also included.
- Ozsolak raw: positive, negative
- Ozsolak A375: positive, negative
- Ozsolak MEC: positive, negative
- Ozsolak IMR90: positive, negative
- Ozsolak MALME: positive, negative
- Ozsolak PM: positive, negative
- Ozsolak MCF7: positive, negative
- Ozsolak T47D: positive, negative
Predicted nucleosome occupancy in the ENCODE regions:
Note: Predicted occupancy across the March 2006 assembly of the entire uman genome is now available in the UCSC Genome Browser. Select the "Nucleosome Occupancy" track.
Top- and bottom-scoring probes: A Python program fasta2matrix.py that computes feature vectors from DNA sequences. The vectors used in this study were computed with a command line of the form
python fasta2matrix.py -upto -revcomp -normalize frequency 6 foo.fa.