This project aims at identifying means to reduce the search space in learning systems as one way to improve the corresponding learning processes. To this end, we study the geometric properties of various connectionistic models known within the field of machine learning using information geometry and algebraic statistics. Our goal is to find distinguished architectures of learning systems based on their expressive power and learning performance.
This kind of model selection is motivated by experimental and theoretical work on restricted Boltzmann machines and deep belief networks, popular learning systems which evermore demand a profound mathematical investigation.
This project targets especially the development of design principles for our embodied AI project.
Keywords: information geometry, algebraic statistics, machine learning.
Selection Criteria for Neuromanifolds of Stochastic Dynamics
Within many formal models of neuronal systems, individual neurons are modelled as nodes which receive inputs from other
nodes in a network and generate an output that can be stochastic in general. This way the dynamics of the whole
network can be described as a stochastic transition in each time step, mathematically formalized in terms of a stochastic matrix.
Well-known models of this kind are Boltzmann machines, their generalizations, and policy matrices within reinforcement learning.
In order to study such learning systems it is helpful to consider not only one stochastic matrix but a parametrized
family of matrices, which forms a geometric object, referred to as neuromanifold within information geometry. Learning crucially depends on the shape of the neuromanifold.
This information geometric view, which has been proposed by Amari, suggests to select appropriate neuromanifolds and to define
corresponding learning processes as gradient flows on these manifolds.
We do not only focus on manifolds that are directly induced by a neuronal model, but study general sets that satisfy natural optimality conditions.
Two dimensional sets containing all deterministic policies
Deterministic policies or near to deterministic policies are optimal for a variety of reinforcement learning problems, they represent dynamics with maximal predictive information as considered in robotics and also dynamics of neural networks with maximal network information flow.
It is always possible to construct a two dimensional set that reaches all deterministic policies and on which natural gradient optimization works very efficiently.
Learning
These videos show experiments comparing ordinary (magenta) and natural (green) gradient learning our two dimensional models. Left: a system with 3 inputs and 2 outputs. Right: a system with 4 inputs and 2 outputs.
The contour lines correspond to the long term expected reward. The natural gradient method reaches the optimal solution in both experiments.
Related Group Publications:
Montúfar, G.
and J.
Morton:
Dimension of marginals of Kronecker product models.
SIAM journal on applied algebra and geometry,
1 (2017) 1, p. 126-151Bibtex MIS-Preprint: 75/2015 [DOI] [ARXIV] Montúfar, G.
and J. Rauh:
Geometry of policy improvement.
Geometric science of information : Third International Conference, GSI 2017, Paris,
France, November 7-9, 2017, proceedings / F.
Nielsen... (eds.). Springer, 2017. - P. 282-290
(Lecture notes in computer science ; 10589)
Bibtex [DOI] [ARXIV] Montúfar, G.
and J. Rauh:
Hierarchical models as marginals of hierarchical models.
International journal of approximate reasoning,
88 (2017), p. 531-546Bibtex MIS-Preprint: 27/2016 [DOI] [ARXIV] Montúfar, G.
and J. Rauh:
Mode poset probability polytopes.
Journal of algebraic statistics,
7 (2016) 1, p. 1-13Bibtex MIS-Preprint: 22/2015 [DOI] [ARXIV] Ay, N.
:
Geometric design principles for brains of embodied agents.
Künstliche Intelligenz : KI,
29 (2015) 4, p. 389-399Bibtex [DOI] [FREELINK] Montúfar, G.
:
Deep narrow Boltzmann machines are universal approximators.
International conference on learning representations - ICLR 2015 : 7-9 May 2015, San
Diego, CA. USA ICLR, 2015Bibtex MIS-Preprint: 113/2014 [ARXIV] Montúfar, G.
:
Universal approximation of Markov kernels by shallow stochastic feedforward networks.
Bibtex MIS-Preprint: 23/2015 [ARXIV] Montúfar, G.
:
A comparison of neural network architectures.
Deep learning Workshop, ICML '15, Vauban Hall at Lille Grande Palais, France, July
10 and 11, 2015Bibtex Montúfar, G.
; Ay, N.
and K. Ghazi-Zahedi:
Geometry and expressive power of conditional restricted Boltzmann machines.
Journal of machine learning research,
16 (2015), p. 2405-2436Bibtex MIS-Preprint: 16/2014 [ARXIV] [FREELINK] Montúfar, G.
; Ghazi-Zahedi, K.
and N. Ay:
A theory of cheap control in embodied systems.
PLoS computational biology,
11 (2015) 9, e1004427Bibtex MIS-Preprint: 70/2014 [DOI] [ARXIV] Montúfar, G.
; Ghazi-Zahedi, K.
and N. Ay:
Geometry and determinism of optimal stationary control in partially observable markov
decision processes.
Bibtex MIS-Preprint: 22/2016 [ARXIV] Montúfar, G.
and J.
Morton:
Discrete restricted Boltzmann machines.
Journal of machine learning research,
16 (2015), p. 653-672Bibtex MIS-Preprint: 106/2014 [ARXIV] [FREELINK] Montúfar, G.
and J.
Morton:
When does a mixture of products contain a product of mixtures?.
SIAM journal on discrete mathematics,
29 (2015) 1, p. 321-347Bibtex MIS-Preprint: 98/2014 [DOI] [ARXIV] Montúfar, G.
and J. Rauh:
Hierarchical models as marginals of hierarchical models.
Proceedings of the 10th Workshop on Uncertainty Processing WUPES '15, Moninec, Czech
Republic, September 16-19th, 2015 / V.
Kratochvíl (ed.). Oeconomica, 2015. - P. 131-145Bibtex MIS-Preprint: 27/2016 [ARXIV] [FREELINK] Montúfar, G.
:
Universal approximation depth and errors of narrow belief networks with discrete units.
Neural computation,
26 (2014) 7, p. 1386-1407Bibtex MIS-Preprint: 74/2014 [DOI] [ARXIV] Montúfar, G.
and J.
Morton:
Geometry of hidden-visible products of statistical models.
Algebraic Statistics 2014 : May 19-22 Illinois Institute of Technology, 2014Bibtex[FREELINK] Montúfar, G.
; Pascanu, R.
; Cho, K.
and Y.
Bengio:
On the number of linear regions of deep neural networks.
Annual Conference on Neural Information Processing Systems : NIPS 2014, Montreal,
Quebec, Canada, December 8th - 13th Neural Information Processing Systems, 2014Bibtex MIS-Preprint: 73/2014 [ARXIV] Montúfar, G.
and J. Rauh:
Scaling of model approximation errors and expected entropy distances.
Kybernetika,
50 (2014) 2, p. 234-245Bibtex [DOI] [ARXIV] Montúfar, G.
; Rauh, J.
and N. Ay:
On the Fisher metric of conditional probability polytopes.
Entropy,
16 (2014) 6, p. 3207-3233Bibtex MIS-Preprint: 87/2014 [DOI] [ARXIV] Pascanu, R.
; Montúfar, G.
and Y.
Bengio:
On the number of response regions of deep feedforward networks with piecewise linear
activations.
International conference on learning representations - ICLR 2014 : 14-16 April 2014,
Banff, Canada ICLR, 2014Bibtex MIS-Preprint: 72/2014 [ARXIV] Rauh, J.
and N. Ay:
Robustness, canalyzing functions and systems design.
Theory in biosciences,
133 (2014) 2, p. 63-78Bibtex MIS-Preprint: 66/2012 [DOI] [ARXIV] Ay, N.
; Montúfar, G.
and J. Rauh:
Selection criteria for neuromanifolds of stochastic dynamics.
Advances in cognitive neurodynamics III : proceedings of the 3rd International Conference
on Cognitive Neurodynamics 2011 ; [June 9-13, 2011, Hilton Niseko Village, Hokkaido,
Japan] / Y.
Yamaguchi (ed.). Springer, 2013. - P. 147-154
(Advances in cognitive neurodynamics)
Bibtex MIS-Preprint: 15/2011 [DOI] Montúfar, G.
:
Mixture decompositions of exponential families - using a decomposition of their sample
spaces.
Kybernetika,
49 (2013) 1, p. 23-39Bibtex MIS-Preprint: 39/2010 [ARXIV] [FREELINK] Montúfar, G.
; Rauh, J.
and N. Ay:
Maximal information divergence from statistical models defined by neural networks.
Geometric science of information : first international conference, GSI 2013, Paris,
France, August 28-30, 2013. Proceedings / F.
Nielsen... (eds.). Springer, 2013. - P. 759-766
(Lecture notes in computer science ; 8085)
Bibtex MIS-Preprint: 31/2013 [DOI] [ARXIV] Montúfar, G.
:
On the expressive power of discrete mixture models, restricted Boltzmann machines,
and deep belief networks - a unified mathematical treatment.
Dissertation, Universität Leipzig, 2012Bibtex[FREELINK] Montúfar, G.
and N. Ay:
Refinements of universal approximation results for deep belief networks and restricted
Boltzmann machines.
Neural computation,
23 (2011) 5, p. 1306-1319Bibtex MIS-Preprint: 23/2010 [DOI] Montúfar, G.
; Rauh, J.
and N. Ay:
Expressive power and approximation errors of restricted Boltzmann machines.
Advances in neural information processing systems 24 : 25th Annual Conference on Neural
Information Processing Systems 2011, Granada, Spain December 12th - 15th ; NIPS 2011 / J. Shawe-Taylor (ed.). Neural Information Processing Systems, 2011. - P. 415-423Bibtex MIS-Preprint: 27/2011 [ARXIV] [FREELINK] Kahle, T.
:
On boundaries of statistical models.
Dissertation, Universität Leipzig, 2010Bibtex[FREELINK] Montúfar, G.
:
Mixture models and representational power of RBM's, DBN's, and DBM's.
Deep Learning and Unsupervised Feature Learning Workshop : NIPS 2010 ; December 19,
2010, Hilton, Vancouver, CanadaBibtex[FREELINK] Ay, N.
and A.
Knauf:
Maximizing multi-information.
Kybernetika,
42 (2006) 5, p. 517-538Bibtex MIS-Preprint: 42/2003 [ARXIV] Ay, N.
:
Aspekte einer Theorie pragmatischer Informationsstrukturierung.
Dissertation, Universität Leipzig, 2001Bibtex