Seamless Kernel Operations on GPU, with auto-differentiation and without memory overflows

Machine Learning
SciDoLySE Meeting, Lyon University, Villeurbanne (France)

Benjamin Charlier

Ghislain Durif

Jean Feydy

Joan Glaunes

François-David Collin


January 7, 2020

Keywords: “Kernel operation”, “Matrix reduction”, “Autodifferentiation”, “GPU”, “PyTorch”, “Numpy”, “Python”, “Matlab”, “R”, “KeOps”


The KeOps library ( provides routines to compute generic reductions of large 2d arrays whose entries are given by a mathematical formula. Using a C++/CUDA-based implementation with GPU support, it combines a tiled reduction scheme with an automatic differentiation engine. Relying on online map-reduce schemes, it is perfectly suited to the scalable computation of kernel dot products and the associated gradients, even when the full kernel matrix does not fit into the GPU memory.

KeOps is all about breaking through this memory bottleneck and making GPU power available for seamless standard mathematical routine computations. As of 2019, this effort has been mostly restricted to the operations needed to implement Convolutional Neural Networks: linear algebra routines and convolutions on grids, images and volumes. KeOps provides GPU support without the cost of developing a specific CUDA implementation of your custom mathematical operators.

To ensure its verstility, KeOps can be used through Matlab, Python (NumPy or PyTorch) and (soon) R backends. For this presentation, we will especially focus (with examples of use) on the presentation of the PyTorch interface (that is interoperable with other native PyTorch operations) and the new R interface.