Limitations of the Empirical Fisher Approximation

Frederik Künstner (École Polytechnique Fédérale de Lausanne)

E1 05 (Leibniz-Saal)

Abstract

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, has recently received attention as a way to capture partial second-order information. Several works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We caution against this argument by discussing the limitations of the empirical Fisher, showing that—unlike the Fisher— it does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that the pathologies of the empirical Fisher can have undesirable effects. This leaves open the question as to why methods based on the empirical Fisher have been shown to outperform gradient descent in some settings. As a step towards understanding this effect, we show that methods based on the empirical Fisher can be interpreted as a way to adapt the descent direction to the variance of the gradients.

Links

conference

27.03.19 29.03.19

Deep Learning Theory Kickoff Meeting Deep Learning Theory Kickoff Meeting

MPI für Mathematik in den Naturwissenschaften Leipzig E1 05 (Leibniz-Saal)

See Details

Valeria Hünniger

Max-Planck-Institut für Mathematik in den Naturwissenschaften Contact via Mail

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences