README

Author: Kris Alavattam Dates: 2025-10-29, 2025-11-06

Zip file of TAD and loop calls

Topologically associating domains (TADs)

Overview

This bundle includes genome-wide insulation scores, boundaries, and TAD annotationss from pooled endothelial and cardiomyocyte differentiation Hi-C datasets analyzed in the Stem Cell Rep 2023 study. (Cardiomyocyte data are from Bertero & Fields et al., Nat Commun 2019.)

Analyses were run at 40-kb resolution on autosomes and chromosome X (as the RUES2 stem cell line is female) using the insulation-score method of Crane et al., Nature 2015 as implemented in the cworld-dekker toolkit. We also generated 10-kb TAD calls; these weren’t used in the paper but are provided for reference.

Files used for analyses in the paper

{endoPooled_D{0,2,6,14},cardioPooled_D{0,2,5,14}}.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation--mbs0--mts0.tads.bed.gz
{endoPooled_D{0,2,6,14},cardioPooled_D{0,2,5,14}}.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation.boundaries.bed.gz

Data contents

Each timepoint (endoPooled_D{0,2,6,14}, cardioPooled_D{0,2,5,14}) includes four gzipped, genome-wide files: 1. TAD domains: *.tads.bed.gz BED intervals marking each called TAD (chrom, start, end, name, score). This is the primary file for TAD overlap/annotation work.

TAD boundaries: *.insulation.boundaries.bed.gz BED intervals at local minima of the normalized insulation profile (i.e., boundary calls). This was also used for TAD overlap/annotation work.
Insulation score track: *.insulation.bedGraph.gz Genome-wide normalized insulation profile used to call boundaries. This enables visualization of bin-wise insulation values and recomputation of TAD boundaries. Not used in the paper.
TAD domain scores: *.tads.bedGraph.gz bedGraph intervals identical to (1), with column 4 giving a “TAD strength score” that summarizes boundary-adjacent insulation behavior. Not used in the paper.

Note: in these files, chromosome names contain a “chr” prefix, whereas loop files do not contain the prefix. If you plan to integrate TAD and loop coordinates, you’ll need to normalize chromosome naming conventions.

What insulation and TAD strength scores represent

Insulation scores

Insulation scores quantify local contact levels (i.e., enrichment or depletion) around each bin’s neighborhood window. Lower (more negative) values indicate stronger insulation (reflecting stronger separation between domains). These are log₂-transformed and mean-centered (--im mean).

TAD strength scores

TAD strength scores summarize the relative drop in insulation from the interior of a TAD to its flanking boundaries. It is derived with insulation2tads.pl from the normalized insulation values across the bins inside each TAD: - Let I[0...n-1] be the normalized insulation values ($\log_2$-mean-centered) across the bins from the left boundary to the right boundary of a TAD. - Let I_max = max(I) (maximum interior normalized insulation). - Let I_left = I[0] (value at the left edge bin) and I_right = I[n-1] (value at the right edge bin). - The TAD score is score = min( I_max − I_left , I_max − I_right )

Essentially, this captures the weaker of the two “drops” from the interior peak to each flanking edge, i.e., a boundary-strength-type summary. TADs with too many NA bins (>25% of bins) are suppressed by the script and not output. NA values are encoded as NaN in the bedGraph.

Parameters, provenance

All tracks and calls were produced from per-chromosome Hi-C matrices (pooled replicates) at 40 kb, which were later concatenated to genome-wide files. Insulation was computed with the following parameters: - Insulation square (--is): 520,001 bp - Delta span (--ids): 320,001 bp - Smoothing (--ss): 160,001 bp - Insulation mode (--im): mean - Noise threshold (--nt): 0.01 - Boundary margin of error (default 0; not expanded in the final outputs)

Representative command (per chromosome and sample):

Click to view

matrix2insulation.pl \
    -i <SAMPLE>_40000_iced_<CHR>_dense_craned.matrix \
    --is 520001 --ids 320001 --ss 160001 --nt 0.01 --im mean \
    -o <OUTDIR>/chr<CHR>/<SAMPLE>.chr<CHR>.40000/<SAMPLE>.chr<CHR>.40000

TAD assembly from insulation and boundaries:

Click to view

insulation2tads.pl \
    -i <OUT>.insulation.txt \
    -b <OUT>.insulation.boundaries.txt \
    -o <OUT>.insulation \
    --mbs 0 --mts 0

File naming conventions

<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation.bedGraph.gz
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation.boundaries.bed.gz
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation--mbs0--mts0.tads.bed.gz
<TIMEPOINT>.genome.40000--is520001--nt0.01--ids320001--ss160001--immean.insulation--mbs0--mts0.tads.bedGraph.gz

where <TIMEPOINT> $\in$ endoPooled_D{0,2,6,14} or cardioPooled_D{0,2,5,14}.

Miscellaneous

Visualization: the bedGraph/BED files load in IGV/UCSC. The TAD bedGraph uses NaN for missing values, and IGV/UCSC handles these (if I’m not mistaken—you’ll want to double check and, if not, strip such rows from the files).
Coordinate system: 0-based, half-open intervals (in keeping with BED and bedGraph conventions).
Reproducibility: exact insulation values depend on window geometry, smoothing, and --im setting; you should use the same parameters above for replication.

Representative workflow code snippets

Conversion of sparse matrices to dense matrices (HiC-Pro; per chromosome)

Click to view

${HICPRO_PATH}/utils/sparseToDense.py \
    -b <SAMPLE>/raw/40000/<SAMPLE>_40000_abs.bed \
    --perchr <SAMPLE>/iced/40000/<SAMPLE>_40000_iced.matrix

Conversion of dense matrices to “Crane-formatted matrices”

Click to view

(AWK call by Giancarlo Bonora.)

awk -v sample=<SAMPLE> -v rez=40000 -v chr=<CHR> '
    NR == 1 {
        for(i = 1; i <= NF; i++) {
            printf("\tbin%s|%s|chr%s:%d-%d", i, sample, chr, ((i - 1) * rez) + 1, (i * rez) + 1)
        }
        printf("\n")
    }
    {
        printf("bin%s|%s|chr%s:%d-%d\t", NR, sample, chr, ((NR - 1) * rez) + 1, (NR * rez) + 1)
        print $0
    }
    ' \
    <SAMPLE>_40000_iced_<CHR>_dense.matrix \
        > <SAMPLE>_40000_iced_<CHR>_dense_craned.matrix

Boundary computation with `matrix2insulation.pl` (as above)

Click to view

matrix2insulation.pl \
    -i <SAMPLE>_40000_iced_<CHR>_dense_craned.matrix \
    --is 520001 --ids 320001 --ss 160001 --nt 0.01 --im mean \
    -o <OUTDIR>/chr<CHR>/<SAMPLE>.chr<CHR>.40000/<SAMPLE>.chr<CHR>.40000

TAD assembly with `insulation2tads.pl` (as above)

Click to view

insulation2tads.pl \
    -i <OUT>.insulation.txt \
    -b <OUT>.insulation.boundaries.txt \
    -o <OUT>.insulation \
    --mbs 0 --mts 0

Additional work

Per-chromosome outputs were concatenated to genome-wide tracks and gzip-compressed.

Loops

Overview

This bundle includes genome-wide chromatin loop calls (“pairwise point interactions” or “PPIs”) from the pooled endothelial and cardiomyocyte differentiation Hi-C datasets analyzed in Stem Cell Rep 2023. (Cardiomyocyte data are from Bertero & Fields et al., Nat Commun 2019.)

Loops were called with HiCCUPS (juicer_tools; Juicer v1.9.9, Jcuda 0.8, --cpu). Primary analyses were run at 10-kb on autosomes and chromosome X (as the RUES2 cell line is female). We also provide 20-kb, 40-kb, and cross-resolution merged sets for completeness; these supplementary sets—and all cardioPooled datasets—were not used in the paper.

Files used for analyses in the paper

endoPooled_D{0,2,6,14}.postprocessed_pixels_10000.bedpe.gz

Data contents

Each timepoint (endoPooled_D{0,2,6,14}, cardioPooled_D{0,2,5,14}) includes: 1. Postprocessed loops: *.postprocessed_{1,2,4}0000.bedpe.gz Primary, analysis-ready loop set (the 10-kb endoPooled files were used in the Stem Cell Rep analyses). Each line lists a significant interaction pixel (one loop) passing all HiCCUPS filters after FDR control and local postprocessing. (For more details, see the HiCCUPS wiki maintained by the Aiden Lab as well as the Supplementary Methods in their Rao and Huntley et al., Cell 2014 study.) This is the primary file for loop overlap/annotation work.

Each `.bedpe` file follows the standard HiCCUPS schema:
<pre><code>chrom1  x1  x2  chrom2  y1  y2  name    score   strand1 strand2 color   observed    expectedBL  expectedDonut   expectedH   expectedV   fdrBL   fdrDonut    fdrH    fdrV    numCollapsed    centroid1   centroid2   radius
</code></pre>

Variable definitions are available in the [HiCCUPS wiki](https://github.com/aidenlab/juicer/wiki/HiCCUPS).

*.enriched_pixels_{1,2,4}0000.bedpe.gz All candidate pixels that pass the enrichment tests (donut/horizontal/vertical/background) before centroid collapsing and final deduplication.

Useful for inspecting the raw peaks underlying the final loop calls and/or redoing or customizing postprocessing. Not used in the paper.
*.fdr_thresholds_{1,2,4}0000.gz Per-resolution FDR cutoff tables produced by HiCCUPS. These record the enrichment thresholds used for each local background model (donut, horizontal, vertical, lower-left) at the chosen FDR.

Useful for auditing or reproducing the calling stringency. Not used in the paper.
*.merged_loops.bedpe.gz Non-redundant union of the 10-, 20-, and 40-kb loop calls, with nearby pixels across resolutions merged to a single representative record.

Provided for completeness and cross-resolution validation. Not used in the paper.

Note: in these files, chromosome names do not have the “chr” prefix used in the various TAD files. If you plan to integrate TAD and loop coordinates, you’ll need to normalize chromosome naming conventions first.

Parameters, provenance

Loop detection was executed using the Juicer hiccups command as in the following representative command:

Click to view

juicer_tools hiccups \
    --cpu \
    -r 10000,20000,40000 \
    -f 0.1,0.1,0.1 \
    -p 4,2,1 \
    -i 7,5,3 \
    -t 0.02,1.5,1.75,2.0,2.5 \
    <INPUT.hic> \
    <OUTPUT_DIR>

Postprocessed loops (postprocessed_pixels_10000.bedpe) were exported from each <SAMPLE>.hiccupsOutput directory and gzip-compressed. Although 20-kb, 40-kb, and multi-resolution (loop-merged) sets were generated, only 10-kb loops were used in the Stem Cell Rep analyses. (Again, happy to share the other data upon request.)

Parameter descriptions can be found in the HiCCUPS wiki.

HiCCUPS operates on .hic files, which are binary matrices produced by Juicer’s pre-processing pipeline (e.g., juicer_tools pre). Paul Fields’ and Giancarlo Bonora’s earlier HiC-Pro runs produced validPairs text files, which can be converted to .hic format via the following:

Click to view

juicer_tools pre -n <HiC-Pro_validPairs> <OUTPUT.hic> <chrom.sizes>

File naming conventions

<TIMEPOINT>.enriched_pixels_10000.bedpe.gz
<TIMEPOINT>.fdr_thresholds_10000.gz
<TIMEPOINT>.postprocessed_pixels_10000.bedpe.gz
<TIMEPOINT>.merged_loops.bedpe.gz

where <TIMEPOINT> $\in$ endoPooled_D{0,2,6,14} or cardioPooled_D{0,2,5,14}.

Miscellaneous

Visualization: BEDPE loops can be viewed in browsers designed for Hi-C (and related) data—e.g., Juicebox or HiGlass—or converted to long-range interaction arcs for use with the UCSC or IGV browsers.
Coordinate system: 0-based, half-open intervals (per BED/BEDPE conventions).
Interpretation: Each line (representing an individual loop) corresponds to a statistically enriched contact between two 10-kb bins. FDR values for multiple neighborhood models (donut, horizontal, vertical, lower-left) are included in columns 16–19.
Reproducibility: Exact loop sets depend on bin size, normalization (e.g., KR for Knight-Ruiz matrix “balancing”), and HiCCUPS parameters; replication requires the same Juicer build and parameters as above.

Representative workflow code snippets

Summary of core steps to derive these outputs:

Click to view

#  Convert HiC-Pro validPairs files to Juicer-compatible .hic files
juicer_tools pre -n <HiC-Pro_validPairs> <OUTPUT.hic> <chrom.sizes>

#  Run HiCCUPS
juicer_tools hiccups --cpu \
    -r 10000,20000,40000 \
    -f 0.1,0.1,0.1 \
    -p 4,2,1 \
    -i 7,5,3 \
    -t 0.02,1.5,1.75,2.0,2.5 \
    -k KR \
    <SAMPLE>.hic <SAMPLE>.hiccupsOutput/

#  Compress and rename 10-kb postprocessed loops
gzip -c <SAMPLE>.hiccupsOutput/postprocessed_pixels_10000.bedpe \
    > <SAMPLE>.hiccups.postprocessed_10kb.bedpe.gz

README

Zip file of TAD and loop calls

Topologically associating domains (TADs)

Overview

Files used for analyses in the paper

Data contents

What insulation and TAD strength scores represent

Insulation scores

TAD strength scores

Parameters, provenance

File naming conventions

Miscellaneous

Representative workflow code snippets

Conversion of sparse matrices to dense matrices (HiC-Pro; per chromosome)

Conversion of dense matrices to “Crane-formatted matrices”

Boundary computation with matrix2insulation.pl (as above)

TAD assembly with insulation2tads.pl (as above)

Additional work

Loops

Overview

Files used for analyses in the paper

Data contents

Parameters, provenance

File naming conventions

Miscellaneous

Representative workflow code snippets

Boundary computation with `matrix2insulation.pl` (as above)

TAD assembly with `insulation2tads.pl` (as above)