Homology Detection via Family Pairwise Search
William Noble Grundy
Journal of Computational Biology. 5(3):479-492,
The function of an unknown biological sequence can often be accurately
inferred by identifying sequences homologous to the original sequence.
Given a query set of known homologs, there exist at least three
general classes of techniques for finding additional homologs:
pairwise sequence comparisons, motif analysis, and hidden Markov
modeling. Pairwise sequence comparisons are typically employed when
only a single query sequence is known. Hidden Markov models (HMMs),
on the other hand, are usually trained with sets of more than 100
sequences. Motif-based methods fall in between these two extremes.
The current work introduces a straightforward generalization of
pairwise sequence comparison algorithms to the case when when multiple
query sequences are available. This algorithm, called Family Pairwise
Search (FPS), combines pairwise sequence comparison scores from each
query sequence. A BLAST implementation of FPS is compared to
representative examples of hidden Markov modeling (HMMER) and motif
modeling (MEME). The three techniques are compared across a wide
range of protein families, using query sets of varying sizes. BLAST
FPS significantly outperforms motif-based and HMM methods.
Furthermore, FPS is much more efficient than the training algorithms
for statistical models.