Family Pairwise Search with Embedded Motif Models

William Noble Grundy
Timothy L. Bailey

Bioinformatics. 15(6):463-470, 1999.


Motivation: Statistical models of protein families, such as position-specific scoring matrices, profiles and hidden Markov models, have been used effectively to find remote homologs when given a set of known protein family members. Unfortunately, training these models typically requires a relatively large set of training sequences. Recent work (Grundy 1999) has shown that, when only a few family members are known, several theoretically justified statistical modeling techniques fail to provide homology detection performance on a par with Family Pairwise Search (FPS), an algorithm that combines scores from a pairwise sequence similarity algorithm such as BLAST.

Results: This paper provides a model-based algorithm that improves FPS by incorporating hybrid motif-based models of the form generated by Cobbler (Henikoff and Henikoff 1997). For the 73 protein families investigated here, this cobbled FPS algorithm provides better homology detection performance than either Cobbler or FPS alone. This improvement is maintained when BLAST is replaced with the full Smith-Waterman algorithm.



PDF version