Webpage to accompany the work of "Modeling 3D genome architecture from Hi-C data using PASTIS" by Jacobs et al. (2022)

Repository

https://github.com/Noble-Lab/pastis-protocol

Purpose of this webpage

The purpose of this web page is to describe the data files that accompany the project that are hosted online, in addition to explain how to modify some of the scripts.

If, after reading the document, things are still unclear, feel free to reach out to me (Mozes Jacobs) at: mozesj@cs.washington.edu

Abstract

Chromosome conformation capture methods such as Hi-C provide rich information about the three-dimensional configuration of DNA in a population of cells. This data is most frequently visualized using heatmap representations of the 2D locus-to-locus contact map. In practice, however, projecting the DNA into a three-dimensional representation can offer valuable intuitions and insights that are not always easy to glean from the contact map. The PASTIS software infers, for a given Hi-C contact map, a corresponding consensus 3D structure, where each bead in the structure corresponds to one row (or column) of the Hi-C map. The algorithm models the distances between beads in the struture using a poisson likelihood function and is able to generate full-genome structures of haploid or diploid structures. PASTIS is implemented in Python and requires only basic knowledge of the command line interface on Linux, Windows, or MacOS. In this protocol, we demonstrate how to use PASTIS to infer 3D structures from Hi-C matrices derived from yeast and human samples. We also show how to visualize the resulting structures in various ways, including direct viewing with Python tools, via several PDB viewers, and using two different genome browsers (4D Nucleome Browser and WashU Epigenome Browser).

Description of files hosted at https://noble.gs.washington.edu/proj/pastis-protocol/

File Filesize Description
4DNFII84FBKM.matrix 141 MB Counts matrix used in the protocol. Extracted from 4DNFII84FBKM.hic from 4DN database
chroms_structure_mpl.png 1320 KB Plot of chromosomes generated in protocol.
counts_heatmap.png 997 KB Counts heatmap generated during protocol.
full_structure_mpl.png 215 KB Full structure plot generated during protocol.
hg38AB.chrom.sizes 716 B Chromosome sizes file generated from data/hg38.chrom.sizes during protocol with homologs labeled as A / B.
hg38_notAB.chrom.sizes 336 B Chromosome sizes file generated from data/hg38.chrom.sizes during protocol without homologs labeled separately.
lad_distances_cut.png 57 KB Zoomed in LAD distances plot generated during protocol.
lad_distances_full.png 57 KB Zoomed in LAD distances plot generated during protocol.
lad_notlad_AB.bedGraph 165 KB bedGraph file denoting LAD and not LAD regions of structure with homologs labeled as A / B.
lad_notlad_AB.bw 60 KB bigWig file generated from lad_notlad_AB.bedGraph and hg38AB.chrom.sizes denoting LAD and not LAD regions of structure with homologs labeled as A / B.
lad_notlad.bedGraph 80 KB bedGraph file denoting LAD and not LAD regions of structure with homologs not labeled separately.
lad_notlad.bw 43 KB bigWig file generated from lad_notlad.bedGraph and hg38_notAB.chrom.sizes denoting LAD and not LAD regions of structure with homologs not labeled separately.
struct_inferred.000.coords 403 KB Structure generated by PASTIS during protocol.
struct_inferred.000.g3d 193 KB struct_inferred.000.coords converted to g3d format.
struct_inferred.000.nucle3d 332 KB struct_inferred.000.coords converted to nucle3d format.
struct_inferred.000.pdb 369 KB struct_inferred.000.coords converted to PDB format.

How to modify / re-run / use new files with the existing code

Using an entirely different counts matrix

To use a different counts matrix, I will suppose you have chosen a matrix from the 4DN database.

  1. First, get the open data URL of the file and adjust bin/download_hic_simple.sh to use this new URL.
  2. Run ./bin/download_hic_simple.sh to download the file.
  3. Next, in config.sh, adjust "HIC_PATH", "COUNTS_PATH", and "LENGTHS_PATH" to match this new file.
  4. Then, run ./bin/extract_matrix_lengths.sh to extact the counts matrix and lengths file

Re-running PASTIS

Using a different LAD file name

Using a different chromosome size file (likely will not happen)?

You will not have to use a file other than data/hg38.chrom.sizes unless you are not working with the human genome anymore. If this is the case, you will have to do a few things:

This situation will (hopefully) not arise (since this protocol is specifically for the human genome), but if it does, feel free to reach out to me (Mozes) and I can point you to what to modify.

After doing modifications

After you do the aforementioned modifications and run PASTIS, I recommend making sure the call to "run_pastis.sh" in bin/run_all.sh is COMMENTED OUT (unless you want to re-run PASTIS). Then, run bin/run_all.sh to regenerate result plots and result files (ie PDB files, bigWig files, etc.). This is the easiest way to regenerate results. Furthermore, to generate the genome browser figures, you will have to re-screenshot them and upload them into the manuscript.