This project aims at identifying means to reduce the search space in learning systems as one way to improve the corresponding learning processes. To this end, we study the geometric properties of various connectionistic models known within the field of machine learning using information geometry and algebraic statistics. Our goal is to find distinguished architectures of learning systems based on their expressive power and learning performance. This kind of model selection is motivated by experimental and theoretical work on restricted Boltzmann machines and deep belief networks, popular learning systems which evermore demand a profound mathematical investigation. This project targets especially the development of design principles for our embodied AI project.
Selection Criteria for Neuromanifolds of Stochastic Dynamics
Within many formal models of neuronal systems, individual neurons are modelled as nodes which receive inputs from other nodes in a network and generate an output that can be stochastic in general. This way the dynamics of the whole network can be described as a stochastic transition in each time step, mathematically formalized in terms of a stochastic matrix. Well-known models of this kind are Boltzmann machines, their generalizations, and policy matrices within reinforcement learning. In order to study such learning systems it is helpful to consider not only one stochastic matrix but a parametrized family of matrices, which forms a geometric object, referred to as neuromanifold within information geometry. Learning crucially depends on the shape of the neuromanifold. This information geometric view, which has been proposed by Amari, suggests to select appropriate neuromanifolds and to define corresponding learning processes as gradient flows on these manifolds. We do not only focus on manifolds that are directly induced by a neuronal model, but study general sets that satisfy natural optimality conditions.
Two dimensional sets containing all deterministic policies
Deterministic policies or near to deterministic policies are optimal for a variety of reinforcement learning problems, they represent dynamics with maximal predictive information as considered in robotics and also dynamics of neural networks with maximal network information flow. It is always possible to construct a two dimensional set that reaches all deterministic policies and on which natural gradient optimization works very efficiently.
In: Proceedings of the 11th workshop on uncertainty processing WUPES '18, June 6-9, 2018 / Václav Kratochvíl (ed.) Praha : MatfyzPress, 2018. - pp. 133-140
In: Geometric science of information : Third International Conference, GSI 2017, Paris, France, November 7-9, 2017, proceedings / Frank Nielsen... (eds.) Cham : Springer, 2017. - pp. 282-290 (Lecture notes in computer science ; 10589)
Hierarchical models as marginals of hierarchical models
In: Proceedings of the 10th workshop on uncertainty processing WUPES '15, Moninec, Czech Republic, September 16-19, 2015 / Václav Kratochvíl (ed.) Praha : Oeconomica, 2015. - pp. 131-145
Notes on the number of linear regions of deep neural networks
In: 2017 international conference on sampling theory and applications (SampTA) / Gholamreza Anbarjafari... (eds.) Piscataway, NJ : IEEE, 2017. - pp. 156-159
In: Proceedings of the 10th workshop on uncertainty processing WUPES '15, Moninec, Czech Republic, September 16-19, 2015 / Václav Kratochvíl (ed.) Praha : Oeconomica, 2015. - pp. 147-154
Guido Montúfar, Razvan Pascanu, Kyunghyun Cho and Yoshua Bengio
On the number of linear regions of deep neural networks
In: NIPS 2014 : Proceedings of the 27th international conference on neural information processing systems - volume 2 ; Montreal, Quebec, Canada, December 8th-13th Cambridge, MA : MIT Press, 2014. - pp. 2924-2932
Maximal information divergence from statistical models defined by neural networks
In: Geometric science of information : first international conference, GSI 2013, Paris, France, August 28-30, 2013. Proceedings / Frank Nielsen... (eds.) Berlin [u. a.] : Springer, 2013. - pp. 759-766 (Lecture notes in computer science ; 8085)
Selection criteria for neuromanifolds of stochastic dynamics
In: Advances in cognitive neurodynamics III : proceedings of the 3rd International Conference on Cognitive Neurodynamics 2011 ; [June 9-13, 2011, Hilton Niseko Village, Hokkaido, Japan] / Yoko Yamaguchi (ed.) Dordrecht : Springer, 2013. - pp. 147-154 (Advances in cognitive neurodynamics)
Scaling of model approximation errors and expected entropy distances
In: Proceedings of the 9th workshop on uncertainty processing WUPES '12 : Marianske Lazne, Czech Republik ; 12-15th September 2012 Praha : Academy of Sciences of the Czech Republik / Institute of Information Theory and Automation, 2012. - pp. 137-148
Expressive power and approximation errors of restricted Boltzmann machines
In: Advances in neural information processing systems 24 : NIPS 2011 ; 25th annual conference on neural information processing systems 2011, Granada, Spain December 12th - 15th / John Shawe-Taylor (ed.) La Jolla, CA : Neural Information Processing Systems, 2011. - pp. 415-423