Supplementary data "Semi-Supervised Protein Classification using Cluster Kernels."
- ROC-50 scores for all families and all
detection methods from the paper in
plain text format.
- ROC scores for all families and all
detection methods from the paper in plain text format.
- Plain text table
specifying the positive and negative training and test sets for
each family. Each row is one sequence, and each column is one family. (0 =
not present; 1 = positive train; 2 = negative train; 3 = positive test; 4 =
negative test). [Same file, but with no headers]
- Summary of data splits
giving the number of positive and negative training and test set examples and amount
of unlabeled data for each family.
of the SCOP families.
file in FASTA format containing all sequences in SCOP version 1.59 with
less than 95% identity.
7329x7329 Kernel matrices for methods used in the experiments:
(here are the IDs by row or column)
- BLAST matrix, ascii text file, gzipped (49 MB).
- PSI-BLAST matrix using the complete 7329 examples as a database, ascii text file, gzipped (52 MB).
- Spectrum Mismatch Kernel , k=5, m=1, ascii text file, gzipped (79 MB).
- The Spider software used in the experiments, a Matlab-based library of machine learning tools.
- Matlab scripts to run the semi-supervised experiments (using the Spider software.)