Hidden Markov Model Analysis of Motifs in Steroid Dehydrogenases and their Homologs

William N. Grundy
Timothy L. Bailey
Charles P. Elkan
Michael E. Baker

Biochemical and Biophysical Research Communications, 231(3):760-766.


The increasing size of protein sequence databases is straining methods of sequence analysis, even as the increased information offers opportunities for sophisticated analyses of protein structure, function and evolution. Here we describe a method called Meta-MEME that uses artificial intelligence-based algorithms to build models of families of protein sequences. These models can be used to search protein sequence databases for remote homologs. The MEME (Multiple Expectation-maximization for Motif Elicitation) software package identifies motif patterns in a protein family, and these motifs are combined into a hidden Markov model (HMM) that can be used as a database searching tool. Meta-MEME is sensitive and accurate, as well as automated and unbiased, making it suitable for the analysis of large datasets. We demonstrate Meta-MEME on a family of dehydrogenases that includes mammalian 11b-hydroxysteroid and 17b-hydroxysteroid dehydrogenase and their homologs in the short chain alcohol dehydrogenase family. We chose this dataset because it is large and phylogenetically diverse, providing a good test of the sensitivity and selectivity of Meta-MEME on a protein family of biological interest. Indeed, Meta-MEME identifies at least 350 members of this family in Genpept96 and clearly separates these sequences from non-homologous proteins. We also show how the MEME motif output can be used for phylogenetic analysis.
PDF version