From Typical Sequences to Typical Genotypes
- Omri Tal (MPI MiS, Leipzig)
The notion of typical sets in information theory is central to the design of efficient coding schemes for communication. We describe novel conceptual and mathematical links between this core information-theoretic notion with its associated asymptotic properties, and properties of genotypes as long sequences of polymorphic markers sampled from multiple populations. We demonstrate that a population assignment scheme based on set-typicality of genetic sequences, entropy and cross-entropy rates of populations, is theoretically viable, and may be of interest particularly in cases of ’noise’ introduced from small samples.