Segway documentation¶

Michael M. Hoffman <mmh1 at uw dot edu>

genomedata
segway.str
segway.inc
input.master
input.*.master
params.params
segway.bed.gz
segtools
posterior.seg*.wig.gz
likelihood.*.tab
segtools
segway --track dnasei --track h3k36me3
dnasei
h3k36me3
dinucleotide
--track=dinucleotide
chr1    0    1
hg18
segway.str
input.master
input.master
input.0.master
input.1.master
input.master
input.master
#ifdef
--num-labels=5:20:5
--distribution=norm
--distribution=asinh_norm
--distribution=gamma
asinh_norm
label len
1:4   200:2200:200
0     200::200
4:    200::200
label
len
1:4
4:
--prior-strength=1
params.3.params.18
3
18
params.params
chr3    400    800   2
/params/params.params
.gz
segway.bed.gz
segway.sh
run.sh
details.sh
jobs.tab
likelihood.*.tab
jt_info.txt
output
e
o
output/e
log/jobs.tab
accumulators
  acc.*.*.bin
auxiliary
  dont_train.list
  segway.inc
likelihood
  likelihood.*.ll
log
  details.sh
  jobs.tab
  jt_info.txt
  likelihood.*.tab
  run.sh
  segway.sh
observations
  *.*.float32
  *.*.int
  float32.list
  int.list
  observations.tab
output
output/e
output/e/0,1,2,3,4,5,...,identify
output/o
params
  input.*.master
  params.*.params.*
  params.*
posterior
segway.bed.gz
segway.str
triangulation
  segway.str.*.*.trifile
  XXXcomp there's another file created for the posterior task
viterbi
emt0.1.34.traindir.ed03201cea2047399d4cbcc4b62f9827
emt
0
1
34
traindir
ed03201cea2047399d4cbcc4b62f9827
qdel "*.ed03201cea2047399d4cbcc4b62f9827"
bkill -J "*.ed03201cea2047399d4cbcc4b62f9827"
vit
jt
vit34.identifydir.4f32630d53724f08b34a8fc58793307d
jt34.identifydir.4f32630d53724f08b34a8fc58793307d
dinucleotide
main()
from segway import run

GENOMEDATA_DIRNAME = "genomedata"

run.main(["--no-identify", GENOMEDATA_DIRNAME])

main()
segway-users
segway-announce
Usage: segway [OPTION]... GENOMEDATADIR

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit

  Input selection:
    -t TRACK, --track=TRACK
                        append TRACK to list of tracks to use (default all)
    --include-coords=FILE
                        limit to genomic coordinates in FILE
    --exclude-coords=FILE
                        filter out genomic coordinates in FILE

  Model files:
    -i FILE, --input-master=FILE
                        use or create input master in FILE
    -s FILE, --structure=FILE
                        use or create structure in FILE
    -p FILE, --trainable-params=FILE
                        use or create trainable parameters in FILE
    --dont-train=FILE   use FILE as list of parameters not to train
    --seg-table=FILE    load segment hyperparameters from FILE
    --semisupervised=FILE
                        semisupervised segmentation with labels in FILE

  Output files:
    -b FILE, --bed=FILE
                        create bed track in FILE

  Intermediate files:
    -o DIR, --observations=DIR
                        use or create observations in DIR
    -d DIR, --directory=DIR
                        create all other files in DIR
    --old-directory=DIR
                        continue from interrupted run in DIR (identify only)

  Modeling variables:
    -D DIST, --distribution=DIST
                        use DIST distribution
    -r NUM, --random-starts=NUM
                        randomize start parameters NUM times (default 1)
    -N SLICE, --num-labels=SLICE
                        make SLICE segment labels (default 2)
    --num-sublabels=NUM
                        make NUM segment sublabels (default 1)
    --resolution=RES    downsample to every RES bp (default 1)
    --ruler-scale=SCALE
                        ruler marking every SCALE bp (default 10)
    --prior-strength=RATIO
                        use RATIO times the number of data counts as the
                        number of pseudocounts for the segment length prior
                        (default 0)
    --segtransition-weight-scale=SCALE
                        exponent for segment transition probability  (default
                        1)

  Technical variables:
    -m PROGRESSION, --mem-usage=PROGRESSION
                        try each float in PROGRESSION as the number of
                        gibibytes of memory to allocate in turn (default
                        2,3,4,6,8,10,12,14,15)
    -S SIZE, --split-sequences=SIZE
                        split up sequences that are larger than SIZE bp
                        (default 2000000)
    -v NUM, --verbosity=NUM
                        show messages with verbosity NUM
    --cluster-opt=OPT   specify an option to be passed to the cluster manager

  Flags:
    -T, --no-train      do not train model
    -I, --no-identify   do not identify segments
    -P, --no-posterior  do not identify probability of segments
    -k, --keep-going    keep going in some threads even when you have errors
                        in another
    -n, --dry-run       write all files, but do not run any executables
TRAINDIRNAME=<define workdir here>
WINNING_THREAD=$(fgrep "" $TRAINDIRNAME/log/likelihood.*.tab | perl -pe 's#^.*/likelihood.(\d+).tab:#\1\t#' | sort -k 2,2g | tail -n 1 | cut -f 1)
cp -v $(ls $TRAINDIRNAME/params/params.${WINNING_THREAD}.params.* \
    | sort -t . -k 4,4rn | head -n 1) "$TRAINDIRNAME/params/params.params"
cp -v "$TRAINDIRNAME/params/input.${WINNING_THREAD}.master" "$TRAINDIRNAME/params/input.master"
(for X in 20091224.stws1 20091224.stws1000; do
    echo $X/{auxiliary,params/input.master,params/params.params,segway.str,triangulation}
done) | xargs tar zcvf 20091224.params.tar.gz
rsync -rtvz --exclude output --exclude posterior --exclude viterbi --exclude observations --exclude "*.observations" --exclude accumulators REMOTEHOST:REMOTEDIR LOCALDIR
for X in likelihood.*.tab; do dc -e "8 k $(tail -n 2 $X | cut -f 1 | xargs echo | sed -e 's/-//g') sc sl ll lc - ll / p"; done

Author:	Michael M. Hoffman <mmh1 at uw dot edu>
Organization:	University of Washington
Address:	Department of Genome Sciences, PO Box 355065, Seattle, WA 98195-5065, United States of America
Copyright:	2009-2010 Michael M. Hoffman

Navigation

Segway documentation¶

The workflow¶

Technical description¶

Data selection¶

Tracks¶

Positions¶

Resolution¶

Model generation¶

Segment duration model¶

Hard length constraints¶

Soft length prior¶

Task selection¶

Train task¶

Unsupervised training¶

Semisupervised training¶

General options¶

Recovery¶

Identify task¶

Recovery¶

Posterior task¶

Technical matters¶

Working files¶

Distributed computing¶

Memory usage¶

Reporting¶

Shell scripts¶

Summary reports¶

GMTK reports¶

Task output¶

Performance¶

Troubleshooting¶

Names used by Segway¶

Files¶

Job names¶

Tracks¶

Python interface¶

Support¶

Command-line usage summary¶

Helpful commands¶

Table Of Contents

Previous topic

This Page

Quick search

Navigation