Promoter region-based classification of genes

Paul Pavlidis
Terrence S. Furey
Muriel Liberto
David Haussler
William Noble Grundy

Proceedings of the Pacific Symposium on Biocomputing, January 3-7, 2001. pp. 151-163.


In this paper we consider the problem of extracting information from the upstream untranslated regions of genes to make predictions about their transcriptional regulation. We present a method for classifying genes based on motif-based hidden Markov models (HMMs) of their promoter regions. Sequence motifs discovered in yeast promoters are used to construct HMMs that include parameters describing the number and relative locations of motifs within each sequence. Each model provides a Fisher kernel for a support vector machine, which can be used to predict the classifications of unannotated promoters. We demonstrate this method on two classes of genes from the budding yeast, S. cerevisiae. Our results suggest that the additional sequence features captured by the HMM assist in correctly classifying promoters.
PDF version
Supplement and data