Inferring diploid 3D chromatin structures from Hi-C data

Alexandra Gesine Cauer, Gürkan Yardımcı, Jean-Philippe Vert, Nelle Varoquaux, William Stafford Noble. 2019. bioRxiv

The 3D organization of the genome plays a key role in many cellular processes, such as gene regulation, differentiation, and replication. Assays like Hi-C measure DNA-DNA contacts in a high-throughput fashion, and inferring accurate 3D models of chromosomes can yield insights hidden in the raw data. For example, structural inference can account for noise in the data, disambiguate the distinct structures of homologous chromosomes, orient genomic regions relative to nuclear landmarks, and serve as a framework for integrating other data types. Although many methods exist to infer the 3D structure of haploid genomes, inferring a diploid structure from Hi-C data is still an open problem. Indeed, the diploid case is very challenging, because Hi-C data typically does not distinguish between homologous chromosomes. We propose a method to infer 3D diploid genomes from Hi-C data. We demonstrate the accuarcy of the method on simulated data, and we also use the method to infer 3D structures for mouse chromosome X, confirming that the active homolog exhibits a bipartite structure, whereas the active homolog does not.

Software

PASTIS github repository

Simulated data generated for this project

The following files contain all of the simulated data used for this paper.
The 3D structure is a tab-delimited file of (x, y, x) coordinates, and the counts and chromosome lengths files are in the hiclib format.

Download the chromosome lengths file, containing the number of beads per chromosome, here.
All of the simulated data may be downloaded together in this file.
                                                                                         
   Counts ambiguity         Replicate         3D
structure     
   Counts       Metadata       Counts
heatmap   
   3D visualization   
Ambiguous 0   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 1   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 2   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 3   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 4   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 5   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 6   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 7   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 8   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Ambiguous 9   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 0   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 1   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 2   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 3   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 4   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 5   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 6   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 7   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 8   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Partially ambiguous 9   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 0   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 1   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 2   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 3   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 4   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 5   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 6   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 7   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 8   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only
Unambiguous 9   Structure     Counts     Metadata   Heatmap     Chromosome     Ref only     Alt only

Public datasets used and analyzed

The following files contain all of the public Hi-C data used for this paper.
The 3D structure is a tab-delimited file of (x, y, x) coordinates, and the counts and chromosome lengths files are in the hiclib format.

Download the chromosome lengths files, containing the number of beads per chromosome, for chromosome X and chromosome 3.
All of the hiclib-formatted public data may be downloaded together in this file.
     
   Chromosome         Counts       Metadata       Counts heatmaps   
X   Ambiguous     Partially ambiguous     Unambiguous   Metadata   Ambiguous     Partially ambiguous     Unambiguous
3   Ambiguous     Partially ambiguous     Unambiguous   Metadata   Ambiguous     Partially ambiguous     Unambiguous

Contact

For questions about the project and/or this supplementary page please e-mail:
  • Gesine Cauer (gesine at uw period edu)
  • Nelle Varoquaux (nelle.varoquaux at gmail period com)
  • William Noble (william-noble at uw period edu)