Efficient GNN training on disk

Paper Title:

DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

Published on:

8 May 2024

Primary Category:

Machine Learning

Paper Authors:

Renjie Liu,

Yichuan Wang,

Xiao Yan,

Zhenkun Cai,

Minjie Wang,

Haitian Jiang,

Bo Tang,

Jinyang Li

Bullets

Key Details

•

Decouples sampling & computation via offline sampling

•

Optimizes data layout with four-level caching hierarchy

•

Avoids amplification via batched packing

•

Overlaps disk I/O & computation with pipelining

•

Speeds up baselines by 8x, matches best accuracy

Explore the topics in this paper

caching and pipelining

disk-based systems

graph neural networks

large graph processing

sampling and layout optimization

AI generated summary

Efficient GNN training on disk

This paper introduces DiskGNN, a system to efficiently train graph neural networks on disk when graphs exceed CPU memory. DiskGNN achieves high efficiency and model accuracy through offline sampling to optimize data layout, four-level caching, batched packing, and pipelined training. Experiments show DiskGNN speeds up state-of-the-art systems by over 8x while matching accuracy.