Protein family classification using sparse Markov transducers

Eleazar Eskin
William Noble Grundy
Yoram Singer

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. August 20-23, 2000. pp. 134-135.


In this paper we present a method for classifying proteins into families using sparse Markov transducers (SMTs). Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on a sequence. SMTs generalize probabilistic suffix trees by allowing for wild cards in the conditioning sequences. Because substitutions of amino acids are common in protein families, incorporating wild card into the model significantly improves classification performance. We present two models for building protein family classifers using SMTs. We also present efficient data structures to improve the memory usage of the models. We evaluate SMTs by building family classifiers using the Pfam database.
PDF version