Exploring gene expression data with class scores

Paul Pavlidis, Darrin P. Lewis and William Stafford Noble

Proceedings of the Pacific Symposium on Biocomputing, 2002. pp. 474--485.


We address a commonly asked question about gene expression data sets: "What functional classes of genes are most interesting in the data?" In the methods we present, expression data is partitioned into classes based on existing annotation schemes. Each class is then given three separately derived "interest" scores. The first score is based on an assessment of the statistical significance of gene expression changes experienced by members of the class, in the context of the experimental design. The second is based on the co-expression of genes in the class. The third score is based on the learnability of the classification. We show that all three methods reveal significant classes in each of three different gene expression data sets. Many classes are identified by one method but not the others, indicating that the methods are complementary. The classes identified are in many cases of clear relevance to the experiment. Our results suggest that these class scoring methods are useful tools for exploring gene expression data.
PDF version