Sequence and structure matching: applications of probability theory

  • Amir Dembo (Stanford University)
Felix-Klein-Hörsaal (Raum 4-24) Universität Leipzig (Leipzig)


The problem of assessing the significance of rare phenomena involving scoring schemes is an example in which probability theory has been quite useful for bio-molecular data analysis. A key reason for the success of this line of research is the ability to focus on questions and models that retain generality and relevance for applications while introducing enough structure to be of theoretical interest and beauty. I shall explore this interplay while reviewing few contributions made in this direction. For example, gapless local alignment is linked to asymptotic of large exceedances in random sequences which is closely related to queuing theory and sequential statistics. Under somewhat different assumptions it leads instead to an asymptotic of waiting times that are highly relevant for information theory. The assessment of significance of approximate local matching for 3D protein structures results with asymptotic theory for maxima of partial sums indexed by geometrical structures. Finally, theoretical considerations of local optimality yield for a certain parameter regime both logarithmic growth of the gapped local alignment score and a bound on its p-value.