Family Pairwise Search with Embedded Motif
William Noble Grundy
Timothy L. Bailey
Bioinformatics. 15(6):463-470, 1999.
Motivation: Statistical models of protein families, such as
position-specific scoring matrices, profiles and hidden Markov models,
have been used effectively to find remote homologs when given a set of
known protein family members. Unfortunately, training these models
typically requires a relatively large set of training sequences.
Recent work (Grundy 1999) has shown that,
when only a few family members are known, several theoretically
justified statistical modeling techniques fail to provide homology
detection performance on a par with Family Pairwise Search (FPS), an
algorithm that combines scores from a pairwise sequence similarity
algorithm such as BLAST.
Results: This paper provides a model-based algorithm that
improves FPS by incorporating hybrid motif-based models of the form
generated by Cobbler (Henikoff and Henikoff 1997). For the 73 protein
families investigated here, this cobbled FPS algorithm provides better
homology detection performance than either Cobbler or FPS alone. This
improvement is maintained when BLAST is replaced with the full