Supplementary data "Semi-Supervised Protein Classification using Cluster Kernels."
- ROC-50 scores for all families and all detection methods from the paper in plain text format.
- ROC scores for all families and all detection methods from the paper in plain text format.
- Plain text table specifying the positive and negative training and test sets for each family. Each row is one sequence, and each column is one family. (0 = not present; 1 = positive train; 2 = negative train; 3 = positive test; 4 = negative test). [Same file, but with no headers]
- Summary of data splits giving the number of positive and negative training and test set examples and amount of unlabeled data for each family.
- Names of the SCOP families.
- Sequence file in FASTA format containing all sequences in SCOP version 1.59 with less than 95% identity.
- 7329x7329 Kernel matrices for methods used in the experiments: (here are the IDs by row or column)
- BLAST matrix, ascii text file, gzipped (49 MB).
- PSI-BLAST matrix using the complete 7329 examples as a database, ascii text file, gzipped (52 MB).
- Spectrum Mismatch Kernel , k=5, m=1, ascii text file, gzipped (79 MB).
- The Spider software used in the experiments, a Matlab-based library of machine learning tools.
- Matlab scripts to run the semi-supervised experiments (using the Spider software.)