

Preprint 66/2016
From Typical Sequences to Typical Genotypes
Omri Tal, Tat Dat Tran, and Jacobus Willem Portegies
Contact the author: Please use for correspondence this email.
Submission date: 28. Sep. 2016
Pages: 61
published in: Journal of theoretical biology, 419 (2017), p. 159-183
DOI number (of the published article): 10.1016/j.jtbi.2017.02.010
Bibtex
Keywords and phrases: typical sequences, typical genotypes, population entropy rate, Classification
Download full preprint: PDF (2825 kB)
Abstract:
We demonstrate an application of a core notion of information theory, that of
typical sequences and their related properties, to analysis of population genetic data.
Based on the asymptotic equipartition property (AEP) for non-stationary discretetime
sources producing independent symbols, we introduce the concepts of typical
genotypes and population entropy rate and cross-entropy rate. We analyze three
perspectives on typical genotypes: a set perspective on the interplay of typical sets
of genotypes from two populations, a geometric perspective on their structure in
high dimensional space, and a statistical learning perspective on the prospects of
constructing typical-set based classifiers. In particular, we show that such classifiers
have a surprising resilience to noise originating from small population samples, and
highlight the potential for further links between inference and communication.