Theresa Wagner, TU Chemnitz
Kernel matrices are crucial in many learning tasks and typically dense and large-scale. Depending on the dimension of the feature space even the compu- tation of all its entries in reasonable time becomes a challenging task. For such dense matrices the cost of a matrix-vector product scales quadratically with the dimensionality, if no customized methods are applied. In basically all kernel methods, a linear system must be solved. Our approach exploits the compu- tational power of the non-equispaced fast Fourier transform (NFFT), which is of linear complexity for fixed accuracy. The ANOVA kernel has proved to be a viable tool to group the features into smaller pieces that are then amenable to the NFFT-based summation technique. Multiple kernels based on lower- dimensional feature spaces are combined, such that kernel-vector products can be realized by this fast approximation algorithm. Based on a feature grouping approach this can be embedded into a CG solver within a learning method and we nearly reach a linear scaling. This approach enables to run learning tasks using kernel methods for large-scale data on a standard laptop computer in rea- sonable time without or very benign loss of accuracy. It can be embedded into methods that rely on kernel matrices or even graph Laplacians. Examples are support vector machines or graph neural networks that can then benefit from having the fast matrix-vector products available. Moreover, we make use of this method for Gaussian process regressors, where the covariance matrix is a kernel matrix. By applying our NFFT-based fast summation technique, fitting the kernel by tuning the hyperparameters, so making predictions by posterior inference can be accelerated.
- joint work with Martin Stoll and Franziska Nestler