Least Squares Denoising: Non-IID Data, Transfer Learning and Under-parameterized Double Descent
- Rishi Sonthalia (UCLA)
Studying the generalization abilities of linear models with real data is a central question in statistical learning. While there exist a limited number of prior works that do validate theoretical work with real data, these works have limitations due to technical assumptions. These assumptions include having a well-conditioned covariance matrix and having independent and identically distributed data. Additionally, prior works that do address distributional shifts usually make technical assumptions on the joint distribution of the train and test data, and do not test on real data. Previous work has also shown that double descent can occur in the over-parameterized regime, and believe that the standard bias-variance trade-off holds in the under-parameterized regime.
In an attempt to address these issues and better model real data, we look at data that is not I.I.D. but has a low-rank structure. Further, we address distributional shift by decoupling assumptions on the training and test distribution. We provide analytical formulas for the generalization error of the denoising problem that are asymptotically exact. These are used to derive theoretical results for linear regression, data augmentation, principal component regression, and transfer learning. We validate all of our theoretical results on real data and have a low relative mean squared error of around 1% between the empirical risk and our estimated risk. Further, we present a simple examplein this paradigm that provably exhibits double descent in the under-parameterized regime.