Sparse grids lead to a substantial storage reduction compared to regular grids.
This comes at a cost as the matrices characterising the solutions of variational
problems involving sparse grids are more dense than in the case of regular
grids. The combination technique reduces the problem on a sparse grid to a set
of problems on regular subgrids. The sparse grid solution is then approximated
by a linear combination of solutions on the subgrids where the combination
coefficients can be obtained from an inclusionexclusion principle. While this
approach works very well in many practical cases, J. Garcke did show in his PhD
thesis that the errors of the combination method can be much larger than the
sparse grid approximation error for some machine learning problems.
Here I will review how an adaptive choice of the coefficients defining the
combination technique can avoid these errors at the cost of the solution of a
dense but relatively small linear system of equations. Experimental evidence
supports this claim for regression in machine learning for which the problem was
first observed and some error bounds are given. The method is then applied to
density estimation using a maximum a posteriori approach. It is seen that the
optimal combination technique can be incorporated in the Newton iteration
technique. It was found experimentally that the combination coefficients
converge towards the ones used in the original combination technique.
In a second part of this talk I will discuss some connections between the
combination technique, statistical backfitting, variational Bayes and mixture
models. In particular, approximations using separable functions and the
KullbackLeibler divergence will be reviewed.
This is joint work with Jochen Garcke and Michael Griebel.
