Pre-training Graph Neural Networks with Kernels
02 Jan 2019Introduction
-
The paper proposes a pretraining technique that can be used with the GNN architecture for learning graph representation as induced by powerful graph kernels.
Idea
-
Graph Kernel methods can learn powerful representations of the input graphs but the learned representation is implicit as the kernel function actually computes the dot product between the representations.
-
GNNs are flexible and powerful in terms of the representations they can learn but they can easily overfit if a large amount of training data is not available as is commonly the case of graphs.
-
Kernel methods can be used to learn an unsupervised graph representation that can be finetuned using the GNN architectures for the supervised tasks.
Architecture
-
Given a dataset of graphs g1, g2, …, gn, use a relevant kernel function to compute k(gi, gj) for all pairs of graphs.
-
A siamese network is used to encode the pair of graphs into representations f(gi) and f(gj) such that dot(f(gi), f(gj)) equals k(gi, gj).
-
The function f is trained to learn the compressed representation of kernel’s feature space.
Experiments
Datasets
- Biological node-labeled graphs representing chemical compounds - MUTAG, PTC, NCI1
Baselines
- DGCNN
- Graphlet Kernel (GK)
- Random Walk Kernel
- Propogation Kernel
- Weisfeiler-Lehman subtree kernel (WL)
Results
-
Pretraining uses the WL kernel
-
Pretrained model performs better than the baselines for 2 datasets but lags behind WL method (which was used for pretraining) for the NCI1 dataset.
Notes
- The idea is straightforward and intuitive. In general, this kind of pretraining should help the downstream model. It would be interesting to try it on more datasets/kernels/GNNs so that more conclusive results can be obtained.