

Preprint 81/2013
Inhomogeneous Parsimonious Markov Models
Ralf Eggeling, Andre Gohr, Pierre-Yves Bourguignon, Edgar Wingender, and Ivo Grosse
Contact the author: Please use for correspondence this email.
Submission date: 14. Aug. 2013
published in: Machine learning and knowledge discovery in databases : European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part 1 / H. Blockeel ... (eds.)
Berlin [u. a.] : Springer, 2013. - P. 321 - 336
(Lecture notes in artificial intelligence ; 8188)
DOI number (of the published article): 10.1007/978-3-642-40988-2_21
Bibtex
Keywords and phrases: Parsimonious Markov models, Sequence analysis, Statistical dependencies
Abstract:
We introduce inhomogeneous parsimonious Markov models
for modeling statistical patterns in discrete sequences. These models are
based on parsimonious context trees, which are a generalization of con-
text trees, and thus generalize variable order Markov models. We follow
a Bayesian approach, consisting of structure and parameter learning.
Structure learning is a challenging problem due to an overexponential
number of possible tree structures, so we describe an exact and efficient
dynamic programming algorithm for finding the optimal tree structures.
We apply model and learning algorithm to the problem of model-
ing binding sites of the human transcription factor C/EBP, and find an
increased prediction performance compared to fixed order and variable
order Markov models. We investigate the reason for this improvement
and find several instances of context-specific dependences that can be
captured by parsimonious context trees but not by traditional context
trees.