Kernel methods for predicting protein-protein interactions
Asa Ben-Hur and William Stafford Noble
ISMB 2005
Datasets:
-
a list of 10,517 physical interactions in
yeast from the BIND database and a set of 10,517 negative examples
generated by choosing random pairs of proteins
-
the homology, GO, and MCC
features for the bind dataset. Note that the MCC feature is
provided only for reference: it needs to be computed on the fly to
avoid using information on the test examples.
-
a dataset of yeast physical
interactions vs. co-complexed pairs of proteins
-
homology, GO and MCC
features for the complexVsPhysical dataset
-
a list of 750 reliable interactions generated
from BIND interactions
-
interactions derived from DIP and MIPS,
plus negative examples
Sequence kernels (some of these files are big: >200MB):
Sample code:
Note -- the code is not meant to run as is -- it represents examples
of PyML usage that you will need to modify to reflect the way your
system is set up. You will need to install PyML.