Pre-training Graph Neural Networks with Kernels

02 Jan 2019

Introduction

The paper proposes a pretraining technique that can be used with the GNN architecture for learning graph representation as induced by powerful graph kernels.
Paper

Graph Kernel methods can learn powerful representations of the input graphs but the learned representation is implicit as the kernel function actually computes the dot product between the representations.
GNNs are flexible and powerful in terms of the representations they can learn but they can easily overfit if a large amount of training data is not available as is commonly the case of graphs.
Kernel methods can be used to learn an unsupervised graph representation that can be finetuned using the GNN architectures for the supervised tasks.

Given a dataset of graphs g₁, g₂, …, g_n, use a relevant kernel function to compute k(g_i, g_j) for all pairs of graphs.
A siamese network is used to encode the pair of graphs into representations f(g_i) and f(g_j) such that dot(f(g_i), f(g_j)) equals k(g_i, g_j).
The function f is trained to learn the compressed representation of kernel’s feature space.

Biological node-labeled graphs representing chemical compounds - MUTAG, PTC, NCI1

Pretraining uses the WL kernel
Pretrained model performs better than the baselines for 2 datasets but lags behind WL method (which was used for pretraining) for the NCI1 dataset.

The idea is straightforward and intuitive. In general, this kind of pretraining should help the downstream model. It would be interesting to try it on more datasets/kernels/GNNs so that more conclusive results can be obtained.