Family-based Homology Detection via Pairwise Sequence Comparison

William Noble Grundy

Proceedings of the Second Annual International Conference on Computational Molecular Biology, March 22-25, 1998. pp. 94-100


The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling. Pairwise sequence comparisons are typically employed when only a single query sequence is known. Hidden Markov models (HMMs), on the other hand, are usually trained with sets of more than 100 sequences. Motif-based methods fall in between these two extremes. The current work compares the performance of representative examples of these three homology detection techniques---using the BLAST, MEME, and HMMER software---across a wide range of protein families, using query sets of varying sizes. Pairwise sequence comparison outperforms motif-based and HMM methods for all query set sizes. Furthermore, heuristic pairwise comparison algorithms are much more efficient than the training algorithms for statistical models.
PDF version