Family-based Homology Detection via Pairwise Sequence
William Noble Grundy
Proceedings of the Second Annual International Conference
on Computational Molecular Biology, March 22-25, 1998.
The function of an unknown biological sequence can often be accurately
inferred by identifying sequences homologous to the original sequence.
Given a query set of known homologs, there exist at least three
general classes of techniques for finding additional homologs:
pairwise sequence comparisons, motif analysis, and hidden Markov
modeling. Pairwise sequence comparisons are typically employed when
only a single query sequence is known. Hidden Markov models (HMMs),
on the other hand, are usually trained with sets of more than 100
sequences. Motif-based methods fall in between these two extremes.
The current work compares the performance of representative examples
of these three homology detection techniques---using the BLAST, MEME,
and HMMER software---across a wide range of protein families, using
query sets of varying sizes. Pairwise sequence comparison outperforms
motif-based and HMM methods for all query set sizes. Furthermore,
heuristic pairwise comparison algorithms are much more efficient than
the training algorithms for statistical models.