Author: Kris Alavattam Dates: 2025-10-29, 2025-11-06
This bundle includes genome-wide insulation scores, boundaries, and TAD annotationss from pooled endothelial and cardiomyocyte differentiation Hi-C datasets analyzed in the Stem Cell Rep 2023 study. (Cardiomyocyte data are from Bertero & Fields et al., Nat Commun 2019.)
Analyses were run at 40-kb resolution on autosomes and chromosome X (as the RUES2 stem cell line is female) using the insulation-score method of Crane et al., Nature 2015 as implemented in the cworld-dekker toolkit. We also generated 10-kb TAD calls; these weren’t used in the paper but are provided for reference.
{endoPooled_D{0,2,6,14},cardioPooled_D{0,2,5,14}}.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation--mbs0--mts0.tads.bed.gz
{endoPooled_D{0,2,6,14},cardioPooled_D{0,2,5,14}}.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation.boundaries.bed.gz
Each timepoint (endoPooled_D{0,2,6,14}, cardioPooled_D{0,2,5,14}) includes four gzipped, genome-wide files:
1. TAD domains: *.tads.bed.gz
BED intervals marking each called TAD (chrom, start, end, name, score). This is the primary file for TAD overlap/annotation work.
TAD boundaries: *.insulation.boundaries.bed.gz
BED intervals at local minima of the normalized insulation profile (i.e., boundary calls). This was also used for TAD overlap/annotation work.
Insulation score track: *.insulation.bedGraph.gz
Genome-wide normalized insulation profile used to call boundaries. This enables visualization of bin-wise insulation values and recomputation of TAD boundaries. Not used in the paper.
TAD domain scores: *.tads.bedGraph.gz
bedGraph intervals identical to (1), with column 4 giving a “TAD strength score” that summarizes boundary-adjacent insulation behavior. Not used in the paper.
Note: in these files, chromosome names contain a “chr” prefix, whereas loop files do not contain the prefix. If you plan to integrate TAD and loop coordinates, you’ll need to normalize chromosome naming conventions.
Insulation scores quantify local contact levels (i.e., enrichment or depletion) around each bin’s neighborhood window. Lower (more negative) values indicate stronger insulation (reflecting stronger separation between domains). These are log₂-transformed and mean-centered (--im mean).
TAD strength scores summarize the relative drop in insulation from the interior of a TAD to its flanking boundaries. It is derived with insulation2tads.pl from the normalized insulation values across the bins inside each TAD:
- Let I[0...n-1] be the normalized insulation values ($\log_2$-mean-centered) across the bins from the left boundary to the right boundary of a TAD.
- Let I_max = max(I) (maximum interior normalized insulation).
- Let I_left = I[0] (value at the left edge bin) and I_right = I[n-1] (value at the right edge bin).
- The TAD score is score = min( I_max − I_left , I_max − I_right )
Essentially, this captures the weaker of the two “drops” from the interior peak to each flanking edge, i.e., a boundary-strength-type summary. TADs with too many NA bins (>25% of bins) are suppressed by the script and not output. NA values are encoded as NaN in the bedGraph.
All tracks and calls were produced from per-chromosome Hi-C matrices (pooled replicates) at 40 kb, which were later concatenated to genome-wide files. Insulation was computed with the following parameters:
- Insulation square (--is): 520,001 bp
- Delta span (--ids): 320,001 bp
- Smoothing (--ss): 160,001 bp
- Insulation mode (--im): mean
- Noise threshold (--nt): 0.01
- Boundary margin of error (default 0; not expanded in the final outputs)
Representative command (per chromosome and sample):
Click to view
matrix2insulation.pl \
-i <SAMPLE>_40000_iced_<CHR>_dense_craned.matrix \
--is 520001 --ids 320001 --ss 160001 --nt 0.01 --im mean \
-o <OUTDIR>/chr<CHR>/<SAMPLE>.chr<CHR>.40000/<SAMPLE>.chr<CHR>.40000
TAD assembly from insulation and boundaries:
Click to view
insulation2tads.pl \
-i <OUT>.insulation.txt \
-b <OUT>.insulation.boundaries.txt \
-o <OUT>.insulation \
--mbs 0 --mts 0
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation.bedGraph.gz
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation.boundaries.bed.gz
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation--mbs0--mts0.tads.bed.gz
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation--mbs0--mts0.tads.bedGraph.gz
where <TIMEPOINT> $\in$ endoPooled_D{0,2,6,14} or cardioPooled_D{0,2,5,14}.
NaN for missing values, and IGV/UCSC handles these (if I’m not mistaken—you’ll want to double check and, if not, strip such rows from the files).--im setting; you should use the same parameters above for replication.Click to view
${HICPRO_PATH}/utils/sparseToDense.py \
-b <SAMPLE>/raw/40000/<SAMPLE>_40000_abs.bed \
--perchr <SAMPLE>/iced/40000/<SAMPLE>_40000_iced.matrix
Click to view
(AWK call by Giancarlo Bonora.)
awk -v sample=<SAMPLE> -v rez=40000 -v chr=<CHR> '
NR == 1 {
for(i = 1; i <= NF; i++) {
printf("\tbin%s|%s|chr%s:%d-%d", i, sample, chr, ((i - 1) * rez) + 1, (i * rez) + 1)
}
printf("\n")
}
{
printf("bin%s|%s|chr%s:%d-%d\t", NR, sample, chr, ((NR - 1) * rez) + 1, (NR * rez) + 1)
print $0
}
' \
<SAMPLE>_40000_iced_<CHR>_dense.matrix \
> <SAMPLE>_40000_iced_<CHR>_dense_craned.matrix
matrix2insulation.pl (as above)Click to view
matrix2insulation.pl \
-i <SAMPLE>_40000_iced_<CHR>_dense_craned.matrix \
--is 520001 --ids 320001 --ss 160001 --nt 0.01 --im mean \
-o <OUTDIR>/chr<CHR>/<SAMPLE>.chr<CHR>.40000/<SAMPLE>.chr<CHR>.40000
insulation2tads.pl (as above)Click to view
insulation2tads.pl \
-i <OUT>.insulation.txt \
-b <OUT>.insulation.boundaries.txt \
-o <OUT>.insulation \
--mbs 0 --mts 0
Per-chromosome outputs were concatenated to genome-wide tracks and gzip-compressed.
This bundle includes genome-wide chromatin loop calls (“pairwise point interactions” or “PPIs”) from the pooled endothelial and cardiomyocyte differentiation Hi-C datasets analyzed in Stem Cell Rep 2023. (Cardiomyocyte data are from Bertero & Fields et al., Nat Commun 2019.)
Loops were called with HiCCUPS (juicer_tools; Juicer v1.9.9, Jcuda 0.8, --cpu). Primary analyses were run at 10-kb on autosomes and chromosome X (as the RUES2 cell line is female). We also provide 20-kb, 40-kb, and cross-resolution merged sets for completeness; these supplementary sets—and all cardioPooled datasets—were not used in the paper.
endoPooled_D{0,2,6,14}.postprocessed_pixels_10000.bedpe.gz
Each timepoint (endoPooled_D{0,2,6,14}, cardioPooled_D{0,2,5,14}) includes:
1. Postprocessed loops: *.postprocessed_{1,2,4}0000.bedpe.gz
Primary, analysis-ready loop set (the 10-kb endoPooled files were used in the Stem Cell Rep analyses). Each line lists a significant interaction pixel (one loop) passing all HiCCUPS filters after FDR control and local postprocessing. (For more details, see the HiCCUPS wiki maintained by the Aiden Lab as well as the Supplementary Methods in their Rao and Huntley et al., Cell 2014 study.) This is the primary file for loop overlap/annotation work.
Each `.bedpe` file follows the standard HiCCUPS schema:
<pre><code>chrom1 x1 x2 chrom2 y1 y2 name score strand1 strand2 color observed expectedBL expectedDonut expectedH expectedV fdrBL fdrDonut fdrH fdrV numCollapsed centroid1 centroid2 radius
</code></pre>
Variable definitions are available in the [HiCCUPS wiki](https://github.com/aidenlab/juicer/wiki/HiCCUPS).
*.enriched_pixels_{1,2,4}0000.bedpe.gz
All candidate pixels that pass the enrichment tests (donut/horizontal/vertical/background) before centroid collapsing and final deduplication.
Useful for inspecting the raw peaks underlying the final loop calls and/or redoing or customizing postprocessing. Not used in the paper.
*.fdr_thresholds_{1,2,4}0000.gz
Per-resolution FDR cutoff tables produced by HiCCUPS. These record the enrichment thresholds used for each local background model (donut, horizontal, vertical, lower-left) at the chosen FDR.
Useful for auditing or reproducing the calling stringency. Not used in the paper.
*.merged_loops.bedpe.gz
Non-redundant union of the 10-, 20-, and 40-kb loop calls, with nearby pixels across resolutions merged to a single representative record.
Provided for completeness and cross-resolution validation. Not used in the paper.
Note: in these files, chromosome names do not have the “chr” prefix used in the various TAD files. If you plan to integrate TAD and loop coordinates, you’ll need to normalize chromosome naming conventions first.
Loop detection was executed using the Juicer hiccups command as in the following representative command:
Click to view
juicer_tools hiccups \
--cpu \
-r 10000,20000,40000 \
-f 0.1,0.1,0.1 \
-p 4,2,1 \
-i 7,5,3 \
-t 0.02,1.5,1.75,2.0,2.5 \
<INPUT.hic> \
<OUTPUT_DIR>
Postprocessed loops (postprocessed_pixels_10000.bedpe) were exported from each <SAMPLE>.hiccupsOutput directory and gzip-compressed. Although 20-kb, 40-kb, and multi-resolution (loop-merged) sets were generated, only 10-kb loops were used in the Stem Cell Rep analyses. (Again, happy to share the other data upon request.)
Parameter descriptions can be found in the HiCCUPS wiki.
HiCCUPS operates on .hic files, which are binary matrices produced by Juicer’s pre-processing pipeline (e.g., juicer_tools pre). Paul Fields’ and Giancarlo Bonora’s earlier HiC-Pro runs produced validPairs text files, which can be converted to .hic format via the following:
Click to view
juicer_tools pre -n <HiC-Pro_validPairs> <OUTPUT.hic> <chrom.sizes>
<TIMEPOINT>.enriched_pixels_10000.bedpe.gz
<TIMEPOINT>.fdr_thresholds_10000.gz
<TIMEPOINT>.postprocessed_pixels_10000.bedpe.gz
<TIMEPOINT>.merged_loops.bedpe.gz
where <TIMEPOINT> $\in$ endoPooled_D{0,2,6,14} or cardioPooled_D{0,2,5,14}.
KR for Knight-Ruiz matrix “balancing”), and HiCCUPS parameters; replication requires the same Juicer build and parameters as above.Summary of core steps to derive these outputs:
Click to view
# Convert HiC-Pro validPairs files to Juicer-compatible .hic files
juicer_tools pre -n <HiC-Pro_validPairs> <OUTPUT.hic> <chrom.sizes>
# Run HiCCUPS
juicer_tools hiccups --cpu \
-r 10000,20000,40000 \
-f 0.1,0.1,0.1 \
-p 4,2,1 \
-i 7,5,3 \
-t 0.02,1.5,1.75,2.0,2.5 \
-k KR \
<SAMPLE>.hic <SAMPLE>.hiccupsOutput/
# Compress and rename 10-kb postprocessed loops
gzip -c <SAMPLE>.hiccupsOutput/postprocessed_pixels_10000.bedpe \
> <SAMPLE>.hiccups.postprocessed_10kb.bedpe.gz