This project aims at identifying means to reduce the search space in learning systems as one way to improve the corresponding learning processes. To this end, we study the geometric properties of various connectionistic models known within the field of machine learning using information geometry and algebraic statistics. Our goal is to find distinguished architectures of learning systems based on their expressive power and learning performance.
This kind of model selection is motivated by experimental and theoretical work on restricted Boltzmann machines and deep belief networks, popular learning systems which evermore demand a profound mathematical investigation.
This project targets especially the development of design principles for our embodied AI project.
Keywords: information geometry, algebraic statistics, machine learning.
Selection Criteria for Neuromanifolds of Stochastic Dynamics
Within many formal models of neuronal systems, individual neurons are modelled as nodes which receive inputs from other
nodes in a network and generate an output that can be stochastic in general. This way the dynamics of the whole
network can be described as a stochastic transition in each time step, mathematically formalized in terms of a stochastic matrix.
Well-known models of this kind are Boltzmann machines, their generalizations, and policy matrices within reinforcement learning.
In order to study such learning systems it is helpful to consider not only one stochastic matrix but a parametrized
family of matrices, which forms a geometric object, referred to as neuromanifold within information geometry. Learning crucially depends on the shape of the neuromanifold.
This information geometric view, which has been proposed by Amari, suggests to select appropriate neuromanifolds and to define
corresponding learning processes as gradient flows on these manifolds.
We do not only focus on manifolds that are directly induced by a neuronal model, but study general sets that satisfy natural optimality conditions.
Two dimensional sets containing all deterministic policies
Deterministic policies or near to deterministic policies are optimal for a variety of reinforcement learning problems, they represent dynamics with maximal predictive information as considered in robotics and also dynamics of neural networks with maximal network information flow.
It is always possible to construct a two dimensional set that reaches all deterministic policies and on which natural gradient optimization works very efficiently.
Learning
These videos show experiments comparing ordinary (magenta) and natural (green) gradient learning our two dimensional models. Left: a system with 3 inputs and 2 outputs. Right: a system with 4 inputs and 2 outputs.
The contour lines correspond to the long term expected reward. The natural gradient method reaches the optimal solution in both experiments.
Related Group Publications:
Ay, N.:
On the locality of the natural gradient for learning in deep Bayesian networks.
Information geometry,
Vol. not yet known, pp. not yet known Bibtex MIS-Preprint: 59/2020 [DOI] [ARXIV] Müller, J.:
On the space-time expressivity of ResNets.
ICLR 2020 workshop on integration of deep neural models and differential equations
: Millennium Hall, Addis Ababa, Ethiopia ; 26th April 2020 ICLR, 2020 Bibtex [ARXIV] [FREELINK] Müller, J. and M. Zeinhofer:
Deep Ritz revisited.
ICLR 2020 workshop on integration of deep neural models and differential equations
: Millennium Hall, Addis Ababa, Ethiopia ; 26th April 2020 ICLR, 2020 Bibtex [ARXIV] [FREELINK] Várady, C. ; Volpi, R. ; Malagò, L. and N. Ay:
Natural wake-sleep algorithm.
Bibtex MIS-Preprint: 84/2020 [ARXIV] Várady, C. ; Volpi, R. ; Malagò, L. and N. Ay:
Natural reweighted wake-sleep.
Bibtex [FREELINK] Ay, N. ; Rauh, J. and G. Montúfar:
A continuity result for optimal memoryless planning in POMDPs.
RLDM 2019 : 4th multidisciplinary conference on reinforcement learning and decision
making ; July 7-10, 2019 ; Montréal, Canada University, 2019. - P. 362-365 Bibtex [FREELINK] Montúfar, G. ; Rauh, J. and N. Ay:
Task-agnostic constraining in average reward POMDPs.
Task-agnostic reinforcement learning : workshop at ICLR, 06 May 2019, New Orleans ICLR, 2019 Bibtex [FREELINK] Montúfar, G.:
Illustration of maxout layer upper bound [Suppl. to: On the number of linear regions
of deep neural networks].
Bibtex [FREELINK] Montúfar, G. ; Rauh, J. and N. Ay:
Uncertainty and stochasticity of optimal policies.
Proceedings of the 11th workshop on uncertainty processing WUPES '18, June 6-9, 2018 / V.
Kratochvíl (ed.). MatfyzPress, 2018. - P. 133-140 Bibtex [FREELINK] Montúfar, G.:
Notes on the number of linear regions of deep neural networks.
2017 international conference on sampling theory and applications (SampTA) / G.
Anbarjafari... (eds.). IEEE, 2017. - P. 156-159 Bibtex [FREELINK] Montúfar, G. and J. Morton:
Dimension of marginals of Kronecker product models.
SIAM journal on applied algebra and geometry,
1 (2017) 1, p. 126-151 Bibtex MIS-Preprint: 75/2015 [DOI] [ARXIV] Montúfar, G. ; Morton, J. and J. Rauh:
Restricted Boltzmann machines [In: Algebraic statistics ; 16 April - 22 April 2017
; report no. 20/2017].
Oberwolfach reports,
14 (2017) 2, p. 1241-1242 Bibtex [DOI] [FREELINK] Montúfar, G. and J. Rauh:
Geometry of policy improvement.
Geometric science of information : Third International Conference, GSI 2017, Paris,
France, November 7-9, 2017, proceedings / F.
Nielsen... (eds.). Springer, 2017. - P. 282-290
(Lecture notes in computer science ; 10589)
Bibtex [DOI] [ARXIV] Montúfar, G. and J. Rauh:
Hierarchical models as marginals of hierarchical models.
International journal of approximate reasoning,
88 (2017), p. 531-546 Bibtex MIS-Preprint: 27/2016 [DOI] [ARXIV] Montúfar, G. and J. Rauh:
Mode poset probability polytopes.
Journal of algebraic statistics,
7 (2016) 1, p. 1-13 Bibtex MIS-Preprint: 22/2015 [DOI] [ARXIV] Ay, N.:
Geometric design principles for brains of embodied agents.
Künstliche Intelligenz : KI,
29 (2015) 4, p. 389-399 Bibtex [DOI] Montúfar, G.:
Deep narrow Boltzmann machines are universal approximators.
Third international conference on learning representations - ICLR 2015 : May 7-9 2015,
San Diego, CA. USA ICLR, 2015 Bibtex MIS-Preprint: 113/2014 [ARXIV] [FREELINK] Montúfar, G.:
Universal approximation of Markov kernels by shallow stochastic feedforward networks.
Bibtex MIS-Preprint: 23/2015 [ARXIV] Montúfar, G.:
A comparison of neural network architectures.
Deep learning Workshop, ICML '15, Vauban Hall at Lille Grande Palais, France, July
10 and 11, 2015 Bibtex [FREELINK] Montúfar, G. ; Ay, N. and K. Ghazi-Zahedi:
Geometry and expressive power of conditional restricted Boltzmann machines.
Journal of machine learning research,
16 (2015), p. 2405-2436 Bibtex MIS-Preprint: 16/2014 [ARXIV] [FREELINK] Montúfar, G. ; Ghazi-Zahedi, K. and N. Ay:
A theory of cheap control in embodied systems.
PLoS computational biology,
11 (2015) 9, e1004427 Bibtex MIS-Preprint: 70/2014 [DOI] [ARXIV] Montúfar, G. ; Ghazi-Zahedi, K. and N. Ay:
Geometry and determinism of optimal stationary control in partially observable Markov
decision processes.
Bibtex MIS-Preprint: 22/2016 [ARXIV] Montúfar, G. and J. Morton:
Discrete restricted Boltzmann machines.
Journal of machine learning research,
16 (2015), p. 653-672 Bibtex MIS-Preprint: 106/2014 [ARXIV] [FREELINK] Montúfar, G. and J. Morton:
When does a mixture of products contain a product of mixtures?.
SIAM journal on discrete mathematics,
29 (2015) 1, p. 321-347 Bibtex MIS-Preprint: 98/2014 [DOI] [ARXIV] Montúfar, G. and J. Rauh:
Hierarchical models as marginals of hierarchical models.
Proceedings of the 10th workshop on uncertainty processing WUPES '15, Moninec, Czech
Republic, September 16-19, 2015 / V.
Kratochvíl (ed.). Oeconomica, 2015. - P. 131-145 Bibtex MIS-Preprint: 27/2016 [ARXIV] [FREELINK] Montúfar, G.:
Universal approximation depth and errors of narrow belief networks with discrete units.
Neural computation,
26 (2014) 7, p. 1386-1407 Bibtex MIS-Preprint: 74/2014 [DOI] [ARXIV] Montúfar, G. and J. Morton:
Geometry of hidden-visible products of statistical models.
Algebraic Statistics 2014 : May 19-22 Illinois Institute of Technology, 2014 Bibtex [FREELINK] Montúfar, G. ; Pascanu, R. ; Cho, K. and Y. Bengio:
On the number of linear regions of deep neural networks.
NIPS'14 Proceedings of the 27th international conference on neural information processing
systems - volume 2 ; Montreal, Quebec, Canada, December 8th-13th MIT Press, 2014. - P. 2924-2932 Bibtex MIS-Preprint: 73/2014 [ARXIV] [FREELINK] Montúfar, G. and J. Rauh:
Scaling of model approximation errors and expected entropy distances.
Kybernetika,
50 (2014) 2, p. 234-245 Bibtex [DOI] [ARXIV] Montúfar, G. ; Rauh, J. and N. Ay:
On the Fisher metric of conditional probability polytopes.
Entropy,
16 (2014) 6, p. 3207-3233 Bibtex MIS-Preprint: 87/2014 [DOI] [ARXIV] Pascanu, R. ; Montúfar, G. and Y. Bengio:
On the number of inference regions of deep feed forward networks with piece-wise linear
activations.
Second international conference on learning representations - ICLR 2014 : 14-16 April
2014, Banff, Canada ICLR, 2014 Bibtex MIS-Preprint: 72/2014 [ARXIV] [FREELINK] Rauh, J. and N. Ay:
Robustness, canalyzing functions and systems design.
Theory in biosciences,
133 (2014) 2, p. 63-78 Bibtex MIS-Preprint: 66/2012 [DOI] [ARXIV] Ay, N. ; Montúfar, G. and J. Rauh:
Selection criteria for neuromanifolds of stochastic dynamics.
Advances in cognitive neurodynamics III : proceedings of the 3rd International Conference
on Cognitive Neurodynamics 2011 ; [June 9-13, 2011, Hilton Niseko Village, Hokkaido,
Japan] / Y.
Yamaguchi (ed.). Springer, 2013. - P. 147-154
(Advances in cognitive neurodynamics)
Bibtex MIS-Preprint: 15/2011 [DOI] Montúfar, G.:
Mixture decompositions of exponential families using a decomposition of their sample
spaces.
Kybernetika,
49 (2013) 1, p. 23-39 Bibtex MIS-Preprint: 39/2010 [ARXIV] [FREELINK] Montúfar, G. ; Rauh, J. and N. Ay:
Maximal information divergence from statistical models defined by neural networks.
Geometric science of information : first international conference, GSI 2013, Paris,
France, August 28-30, 2013. Proceedings / F.
Nielsen... (eds.). Springer, 2013. - P. 759-766
(Lecture notes in computer science ; 8085)
Bibtex MIS-Preprint: 31/2013 [DOI] [ARXIV] Montúfar, G.:
On the expressive power of discrete mixture models, restricted Boltzmann machines,
and deep belief networks - a unified mathematical treatment.
Dissertation, Universität Leipzig, 2012 Bibtex [FREELINK] Montúfar, G. and J. Rauh:
Scaling of model approximation errors and expected entropy distances.
Proceedings of the 9th workshop on uncertainty processing WUPES '12 : Marianske Lazne,
Czech Republik ; 12-15th September 2012 Academy of Sciences of the Czech Republik / Institute of Information Theory and Automation, 2012. - P. 137-148 Bibtex [ARXIV] [FREELINK] Montúfar, G. and N. Ay:
Refinements of universal approximation results for deep belief networks and restricted
Boltzmann machines.
Neural computation,
23 (2011) 5, p. 1306-1319 Bibtex MIS-Preprint: 23/2010 [DOI] [ARXIV] Montúfar, G. ; Rauh, J. and N. Ay:
Expressive power and approximation errors of restricted Boltzmann machines.
Advances in neural information processing systems 24 : 25th annual conference on neural
information processing systems 2011, Granada, Spain December 12th - 15th ; NIPS 2011 / J. Shawe-Taylor (ed.). Neural Information Processing Systems, 2011. - P. 415-423 Bibtex MIS-Preprint: 27/2011 [ARXIV] [FREELINK] Kahle, T.:
On boundaries of statistical models.
Dissertation, Universität Leipzig, 2010 Bibtex [FREELINK] Montúfar, G.:
Mixture models and representational power of RBM's, DBN's, and DBM's.
NIPS 2010 : Deep learning and unsupervised feature learning workshop ; December 19,
2010, Hilton, Vancouver, Canada NIPS, 2010. - P. 1-9 Bibtex [FREELINK] Ay, N. and A. Knauf:
Maximizing multi-information.
Kybernetika,
42 (2006) 5, p. 517-538 Bibtex MIS-Preprint: 42/2003 [ARXIV] Ay, N.:
Aspekte einer Theorie pragmatischer Informationsstrukturierung.
Dissertation, Universität Leipzig, 2001 Bibtex