Using mixtures of common ancestors for estimating the
probabilities of discrete events in biological sequences
Eleazar Eskin, William Noble Grundy and Yoram Singer
Proceedings of the Ninth International Conference on Intelligent
Systems for Molecular Biology. July 21-25, 2001. To appear.
Abstract
Accurately estimating probabilities from observations is important for
probabilistic-based approaches to problems in computational biology.
In this paper we present a biologically-motivated method for
estimating probability distributions over discrete alphabets from
observations using a mixture model of common ancestors. The method is
an extension of substitution matrix-based probability estimation
methods. In contract to previous substition matrix-based methods, our
method has a simple Bayesian interpretation. The method presented in
this paper has the advantage over Dirichlet mixtures that it is both
effective and simple to compute for large alphabets. The method is
applied to estimate amino acid probabilities based on observed counts
in an alignment and is shown to perform comparably to previous
methods. The method is also applied to estimated probability
distributions over protein families and improves protein
classification accuracy.
PDF version
Home