We have decided to discontinue the publication of preprints on our preprint server as of 1 March 2024. The publication culture within mathematics has changed so much due to the rise of repositories such as ArXiV (www.arxiv.org) that we are encouraging all institute members to make their preprints available there. An institute's repository in its previous form is, therefore, unnecessary. The preprints published to date will remain available here, but we will not add any new preprints here.
MiS Preprint
66/2016
From Typical Sequences to Typical Genotypes
Omri Tal, Tat Dat Tran and Jacobus Willem Portegies
Abstract
We demonstrate an application of a core notion of information theory, that of typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for non-stationary discretetime sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy rate and cross-entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.