- For each dataset we choose a nominal spacing consistent with the distance between measurements in the raw data: TR50, 50bp; RNA, 50bp; H3ac, 1000bp; H3K27me, 500bp. Interpolation at equally-spaced coordinates of the generally unequally-spaced raw data is accomplished in one of two ways, depending on the size of the gap to be interpolated over. For gaps less than 2000bp, data are linearly interpolated using the two immediately flanking data points. For gaps larger than 2000bp, we use an adaptive loess fitting strategy. In this case, a linear loess fit is computed at the point of interpolation, using all points in a window of width 50 times the gap to be filled for the fit. The R function loess is used for this purpose, using default weights. The value of the loess fit at the point of interpolation is taken to be the interpolated value there. R routines for doing this interpolation can be found here (bed.R, interp.R). See, in particular, function UCSC2bed in bed.R. An example R script calling these R routines can be found here. The resulting datasets using this scheme, in BED format, are located here:
- 50bp TR50 data
- 50bp RNA data (files starting with "AffyRnaSignal.HeLa")
- 1000bp H3ac data (files starting with "SangerChip.H3ac.HeLa")
- 500bp H3K27me3 data (files starting with "UcsdChip.H3K27me3")
- Since H3ac is the coarsest of the three datasets, we choose to
interpolate the other datasets at its ordinates. First, however, we
smooth the fine-scale histone RNA data out to a scale close to 1000bp
using MODWT wavelet smoothing (the la8 wavelet). The closest dyadic
multiple of 50bp to 1000bp is 800bp. The wavelet smooths can be
therefore be computed without segmentation using HMMSeg as follows,
java -jar HMMSeg.jar --input-bed --smooth-only 800 [file-list]

where file-list contains a list of all 50bp RNA data files. Note that while the TR50 data is available at every 50bp, the effective resolution is much coarser, due to loess smoothing during the calculation of that curve. Therefore, no wavelet smoothing is required of TR50. The result of interpolating the three datasets (including the wavelet smoothed RNA data) at the 1000bp coordinates of H3ac are available here:

/home/..[path to data]../UvaDnaRepTr50.HeLa.hg17.ENm001.50bp.score.bed /home/..[path to data]../UvaDnaRepTr50.HeLa.hg17.ENm002.50bp.score.bed . . . /home/..[path to data]../UvaDnaRepTr50.HeLa.hg17.ENr334.50bp.score.bedAnd similarly for

java -jar HMMSeg.jar --num-states 2 --input-bed --smooth 64000 --nstarts 10 --log log.txt \ tr50.list rna.list h3ac.list h3k27me3.list