Cancer progression is an evolutionary process that is driven by mutation and selection in a population of tumor cells. We discuss mathematical models of cancer progression, starting from traditional multistage theory. Each stage is associated with the occurrence of genetic alterations and their fixation in the population. We describe the accumulation of mutations using conjunctive Bayesian networks, an exponential family of waiting time models in which the occurrence of mutations is constrained by a partial order. Two opposing limit cases arise if mutations either follow a linear order or occur independently. We derive analytical expressions for the waiting time until a specific number of mutations have accumulated and show how the waiting time relates to the dependency structure among mutations.
Partitioning of data sets into groups defines an important preprocessing step for compression, prototype extraction or outlier removal. Various criteria of connectedness or proximity have been proposed to group data according to structural similarity but in general it is unclear which method or model to use. In the spirit of information theory we propose a decision process to determine the extractable information from data conditioned on a hypothesis class of structures. Maximizing the amount of information which can be reliably learned from data in the presence of noise selects appropriate models. Empirical evidence for this model selection concept is provided by cluster validation in bioinformatics and in computer security, i.e., the analysis of microarray data and multilabel clustering of Boolean data for role based access control.
In many applications (hypothesis testing and disclosure limitation) one wants to investigate the set of probability measures or contingency tables satisfying a given set of linear constraints. The main examples for such constraints are fixed expectation values or marginal distributions and conditional distributions of subsystems. Markov bases can be used to explore these sets. However, the use of Markov bases may not always be applicable, for example when the linear constraints are not given by integral equations. Some ideas how to generalize the Markov bases technique are presented.
We face the problem of the optimization of a pseudo-Boolean function by introducing a stochastic relaxation based on the exponential family. The original discrete optimization problem is replaced by the optimization of the expected value of the pseudo-Boolean function w.r.t. some distribution in a parametric statistical model. We study the relation among the two problems, by providing results about stationary conditions and critical points. We prove that the presence of local minima depends on the relationship between the function to be optimized and the statistical model employed in the relaxation, and we discuss the results obtained with simple examples. Finally we show how the marginal polytope can give useful insights about the relation between a stochastic relaxation and local search techniques used in integer programming.