Machine learning
Seamless Kernel Operations on GPU with auto-differentiation and without memory overflows

Benjamin Charlier

Jean Feydy

Joan Glaunes

Ghislain Durif

François-David Collin


October 9, 2019

(contribution and maintenance, initial development of RKeOps)

Collaboration with:

  • B. Charlier (IMAG - Univ Montpellier)
  • F.-D. Collin (CNRS - IMAG - Univ Montpellier)
  • J. Feydy (Imperial College, London)
  • J. Glaunès (MAP5 - Univ Paris Descartes)

The KeOps library ( provides routines to compute generic reductions of large 2d arrays whose entries are given by a mathematical formula. Using a C++/CUDA-based implementation with GPU support, it combines a tiled reduction scheme with an automatic differentiation engine. Relying on online map-reduce schemes, it is perfectly suited to the scalable computation of kernel dot products and the associated gradients, even when the full kernel matrix does not fit into the GPU memory.

KeOps is all about breaking through this memory bottleneck and making GPU power available for seamless standard mathematical routine computations. As of 2019, this effort has been mostly restricted to the operations needed to implement Convolutional Neural Networks: linear algebra routines and convolutions on grids, images and volumes. KeOps provides GPU support without the cost of developing a specific CUDA implementation of your custom mathematical operators.

To ensure its verstility, KeOps can be used through Matlab, Python (NumPy or PyTorch) and (soon) R backends. For this presentation, we will especially focus (with examples of use) on the presentation of the PyTorch interface (that is interoperable with other native PyTorch operations) and the new R interface that will be released very soon.


  • Python (PyKeOps)
  • Matlab (KeOpsLab)
  • R (RKeOps)


  • Kernel operation
  • Matrix reduction
  • Autodifferentiation
  • GPU
  • PyTorch
  • Numpy