Supplementary data for "Epigenetic priors for identifying active transcription factor binding sites"
Gabriel Cuellar-Partida, Fabian A. Buske, Robert C. McLeay, Tom Whitington, William Stafford Noble and Timothy L. Bailey
Experiment 1 (Section 3.1)
- TF ChIP-seq data sets from Mouse ESC:
- ChIP-seq_mouse.tgz 1.3M
ChIP-seq_mouse/$TF.bed
BED files from ChIP-seq assays for the described transcription factors.- Gold standard and results:
- Mouse_experiments.tgz 2.6G
Mouse_experiments/$TF.Prior.Fil.chr#.txt
Tab-separated columns:
- Chr
- Start position
- Strand
- PWM score (FIMO)
- P-value (FIMO)
- 1/-1 (Positive/Negative) (Gold standard)
- Log-posterior odds score, described in this paper using H3K4me3 Mouse ESC prior
- Filtering number (PWM score if passed threshold of 1, PWM score-100 otherwise)
- Filtering number (PWM score if passed threshold of 2, PWM score-100 otherwise)
- Filtering number (PWM score if passed threshold of 4, PWM score-100 otherwise)
- Filtering number (PWM score if passed threshold of 8, PWM score-100 otherwise)
- Filtering number (PWM score if passed threshold of 16, PWM score-100 otherwise)
- Actual H3K4me3 density (-1, if it couldn't be determined (repetitive region)
Experiment 2 (Section 3.2)
- TF ChIP-seq data sets from K562 cell line:
- ChIP-seq.tgz 2.5M
ChIP-seq/ $TF.bed
Bed file of ChIP-seq assays for the described transcription factors.- The next set of files correspond to those used to create the priors (those without prior suffix), and the priors (prior suffix). The files are on wiggle format.
- DNase.tgz 27M
- DNasePrior.tgz 47M
- H3K4me1.tgz 49M
- H3k4me1Prior.tgz 75M
- H3K4me3.tgz 44M
- H3K4me3Prior.tgz 69M
- H3K9ac.tgz 50M
- H3K9acPrior.tgz 78M
- H3K27ac.tgz 39M
- H3K27acPrior.tgz 63M
- Gold standard and results
- Human_experiments.tgz 7.5G
Human_experiments/$TF_27ac9ac.me3.me1.DNase.Combined.chr#.txt
Tab-separated columns:
- Chr#
- Start position
- Strand
- PWM score (FIMO)
- P-value (FIMO)
- 1/-1 (Positive/Negative) (Gold standard)
- Log-posterior odds score, H3K27ac K562 prior
- Log-posterior odds score, H3K9ac K562 prior
- Log-posterior odds score, H3K4me3 K562 prior
- Log-posterior odds score, H3K4me1 K562 prior
- Log-posterior odds score, DNase HS K562 prior
- Log-posterior odds score, H3K4me3 + DNase HS K562 prior
- Log-posterior odds score, H3K27ac + DNase HS K562 prior
- Log-posterior odds score, H3K4me1 + H3K4me3 K562 prior
- Log-posterior odds score, H3K9ac + H3K27ac K562 prior
- Log-posterior odds score, H3K4me3 + H3K9ac + H3K27ac K562 prior
- Log-posterior odds score, H3K4me1 + H3K4me3 + H3K9ac + H3K27ac K562 prior
Experiment 3 (Section 3.3)
- Gold standard and results
- centipede.tar.gz 3.3M
centipede/$TF.bed
The format of the files is similar to the UCSC bed format, which each column.
Note that columns 1-11 are from CENTIPEDE paper:
Pique-Regi R*, Degner JF*, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Research. 2011. Jan 20.
- Chromosome
- Starting position of the motif (0-based) hg18
- End position of the motif (1-based) hg18
- Identifier for the PWM (Transfac, or Jaspar ID).
- Score for the PWM match (as described in the supplement log2 (likelihood ratio PWM model / simple 0.25 background model) )
- Strand of the match ("+" Forward, "-" Reverse)
- Posterior log odds of TF binding from the Centipede model
- Posterior probability of TF binding from the Centipede model
- Number of ChIPseq-reads surrounding the motif match
- Number of control ChIP-reads surrounding the motif match
- If inside a ChIP-seq peak called by Macs or Encode (see supplement for details)
1 -- Inside a high confidence ChIP-seq peak (POSITIVE)
0 -- Defined as ChIP-seq peak NEGATIVE- DNase prior score
- DNase score
Please direct questions to Gabriel Cuellar-Partida (gcuellar@lcg.unam.mx), William Noble (william-noble@uw.edu) or Timothy Bailey (t.bailey@imb.uq.edu.au).