Motif-based protein ranking by network propagation

R. Kuang, J. Weston, W. S. Noble and C. Leslie

Bioinformatics. 21(19):3711-3718, 2005.


Sequence similarity often suggests evolutionary relationships between protein sequences that can be important for inferring similarity of structure or function. The most widely-used homology detection tools, such as BLAST and PSI-BLAST, are pairwise sequence comparison algorithms that use sequence-sequence or profile-sequence alignments to return a ranked list of sequences similar to a query. However, these methods often fail to detect less conserved remotely-related targets.

In this paper, we propose a new general graph-based propagation algorithm called MotifProp to detect more subtle similarity relationships than pairwise comparison methods. MotifProp is based on a protein-motif network, in which edges connect proteins and the k-mer based motif features that they contain. We show that our new motif-based propagation algorithm can improve ranking over a base algorithm, such as PSI-BLAST, that is used to initialize the ranking. Despite the complex structure of the protein-motif-network, MotifProp is an easily interpretable approach. Activation scores of motif nodes provided by MotifProp are useful for retrieving top motifs important to the ranking, which is a natural motif selection method for discovering conserved structural components in remote homologies. We can also map these activation scores of feature nodes onto the query sequence to extract motif-rich regions, and by comparing these regions with PDB annotations, we find that these propagation-induced motif-rich regions contain meaningful structural and functional information.

Supplementary information