From Causal Inference to Autoencoders, Memorization and Gene Regulation
- Caroline Uhler (Massachusetts Institute of Technology)
Abstract
Recent progress in genomics makes it possible to perform perturbation experiments at a very large scale. This motivates the development of a causal inference framework that is based on observational and interventional data. We characterize the causal relationships that are identifiable and present the first provably consistent algorithm for learning a causal network from such data. I will then couple gene expression with the 3D genome organization. In particular, we will discuss approaches for integrating different data modalities such as sequencing or imaging via autoencoders. We end by a theoretical analysis of autoencoders linking overparameterization to memorization. In particular, we will show that overparameterized single-layer fully connected autoencoders as well as deep convolutional autoencoders memorize images, i.e., they produce outputs in the span of the training images. Collectively, this talk will highlight the symbiosis between biology and machine learning, showing how biology can lead to new theorems, which in turn can guide biological experiments.