Veranstaltungen
Phylogenetic gathering

MPI für Mathematik in den Naturwissenschaften Leipzig

G3 10 (Lecture hall)

conference

23.05.04 28.05.04

Phylogenetic gathering

conference

23.05.04 28.05.04

Phylogenetic gathering

The Max Planck Institute for Mathematics in the Sciences (MiS) and the Interdisciplinary Center for Bioinformatics (IZBI) in Leipzig will sponsor a (comparatively small and very informal) workshop on "Phylogenetic Combinatorics". Topics will range from the integration of palaeontological and molecular data to tree-space analysis to tank-based clustering methods and the analysis of proteomics data, e.g. micro-array data and fluorescence data.

The idea is that people would start flocking in from Sunday, May 23, on to meet in smaller or larger groups for absolutely informal discussions and joint work on whatever the participants may want to work on, and that an officially announced workshop would start on Thursday, May 27.

From outside Germany, the following scientists have already agreed to join this meeting: Alex Grossmann (Paris), David Epstein (Warwick), Mike Steel (Christchurch, NZ), and Bill Atchley (Raleigh, NC, USA).

Program

20:00 -

Concluding concert of this year's Bach Festival

	Part I: Workshop Discussions
09:30 - 10:00	Welcome and Program Discussion
10:00 - 12:00	Non-tree like Evolution: with contributions from David Bryant* on "Phylogenetic networks and probabilistic models" and from Stefan Grünewald and Vince Moulton on "Cyclic split systems".*
12:00 - 14:00	Lunch break
14:00 - 16:00	Language Evolution: with contributions from David Bryant* on "Some problems applying phylogenetics to linguisitics" and from Søren Wichmann on "The distribution of language family sizes".* The distribution of language family sizes When the sizes of language families of the world, measured by the number of languages contained in each family, are plotted against their size ranking, it is seen that the distribution approximates a line defined by the formula y = ax-b. It is suggested that this apparent power-law distribution of language family sizes is of relevance when evaluating over-all classifications of the world's languages, for the analysis of taxonomic structures, and for developing hypotheses concerning the prehistory of the world's languages. It seems that three different major generating models will all eventually lead to power-law distributions: preferential attachment (in the case of networks), the Galton-Watson stochastic branching process (or some version thereof), and (some version) of the sand-pile model. General ingredients in a simulation procedure will be discussed and the results of an initial attempt to simulate the distribution of language family sizes using the second of the models mentioned will be shown.
16:00 - 16:30	Coffee break / Tea time
16:30 - 18:30	Combinatorial Aspects of Phylogenetic Analysis: with contributions from Arndt von Haeseler on "Reconstruction of trees for large datasets", Mihai Albu / Andreas Dress on "Rank-based Methods in Tree Reconstruction (Generalized Apresjan Clusters)", Alex Grossmann "Ranks of amino acids mutabilities as phylogenetic markers". Ranks of amino acid mutabilities as phylogenetic markers Consider an arbitrary set on square matrices with non-negative entries. One may ask whether they can be represented in a way that mimics the logarithmic-scale display of one-dimensional data. Such a representation can be obtained by the use of elementary properties of matrix-valued logarithms. The set of matrices may consist of the count matrices obtained from pairwise subalignments of an alignment of N sequences. The method associates a "rate" matrix to any pair of sequences, and makes it possible to consider two artificial continuous "evolutions" leading from one sequence to the other - if these "evolutions" exist. The negative trace of the "rate" matrix is the LogDet distance between the two sequences. The information contained in a "rate" matrix is richer than just the trace.The aim of this work is to exploit this additional information. This is can be done e.g. by focusing on the order of values in the entries of the "rate" matrix, i.e. by rank methods. We have used this method on proteins coded by a set 123 mitochondrial genomes of metazoa. Report on Rank-based Analyses of B Subtilis Micro Array Dat
	Monday "After-Dinner-Discussion": Spatial aspects of genome organisation and gene expression.: with a contribution from Klaus Scherrer* on "What does DNA `know' about 3-D space beyond proteins and their"cristallisation"?*

09:30 - 10:00	Program Discussion
10:00 - 12:00	Genomics and Phylogenies: with contributions from Mike Steel on "What does genomic data (gene content, gene order, duplications, SINEs etc) tell us about species phylogeny, and how can we best analyse this data?" and Klaus Scherrer on "How to define a Gene today? Towards unambigeous terminology useful in genetics, phylogeny and ontology", "What is really meant by `epigenetic'?", and "Spatial aspects of genome organisation and gene expression".
12:00 - 14:00	Lunch break
14:00 - 15:30	Protein super families: with contributions from Bill Atchley* on "When is a protein a Myc?".* When is a protein a Myc? A sequence signature is a small set of amino acid sites that, when considered simultaneously, provides an accurate identification of a specific set of proteins. Sequence signatures or predictive motifs can be powerful tools in proteomics. I will describe how we have integrated information theory, multivariate statistics and fuzzy logic searching procedures to find small contiguous sets of amino acids that give very accurate probabilistic identifications of protein families. This should be complemented by Hermann Ragg referring to his work on serpins.
15:30 - 16:00	Identification of sequences from partial sequence information: with a contribution from Peter Erdõs* on "Subwords in Reverse Complement Order".* Subwords in reverse complement order Given two natural number n,i, let F(n,i) denote the set of all strictly monotonous maps from {1,2,...,i} into {1,2,...,n}. Further, given an alphabet A:={a,a',b,b'} and a sequence s = s(1) s(2) ... s(n) of length n over A, let S' denote the sequence s':= s(n)' s(n-1)' ... s(1)' with x':= a' for x=a, x':= a for x=a', x':= b' for x=b, and x':= b for x=b'. Finally, with n,i, A, and s as above, let Subword(s\|i) denote the set Subword(s\|i) := {s(f(1)) ... s(f(i)) : f in F(n,i)}. Then, one has s=t or s=t' for any two sequences s,t of length n with entries from A if and only of the union of the two sets Subword(s\|[2n/3]) and Subword(s'\|[2n/3]) coincides with the union of the two sets Subword(t\|[2n/3]) and Subword(t'\|[2n/3]).
16:00 - 16:30	Coffee break / Tea time
16:30 - 18:30	Horizontal Gene Transfer: with contributions from Arndt von Haeseler* on "Estimating Lateral Gene Transfer" and Will Casey on "Horizontal Gene Flow in Bacteria".*

09:30 - 10:00	Program Discussion
10:00 - 12:00	Quartet Methods: with contributions from Peter Erdõs* on "Quartet inference problems", Mihai Albu on "The `Distance Quartet Puzzling' algorithm", and Stefan Grünewald on "Quartet Joining".* A Distance Quartet Puzzling Algorithm.Phylogenetic analysis increasingly employs sophisticated mathematical tools ranging from stochastic modelling of Markov processes, principal component analysis, or integer programming to various branches of combinatorics, including extremal combinatorics and combinatorial analysis of multivariate relationships, in particular those derived from (dis)similarity data. The work presented here deals with the latter topics, that is, the construction of phylogenetic trees from quartets (resolved trees on four leaves). Most formulations of the problem are NP-hard. Here we consider a new version that has a polynomial time solution. We present applications of this algorithm for idenitfying putative clades and for elucidating spurious phylogenetic relationships. Also, we note that our algorithm can be applied to weighted sets of quartets. Finally, we will present some output trees and differences using the four variants of the algorithm.
12:00 - 14:00	Lunch break
14:00 - 15:30	Cladogenesis: with a contribution from Dietrich Stoyan on Statistical modelling of cladogenetic processes and A.H.Müllers Formen-Maxima Rule Does it make sense to spend time in refined modelling of cladogenesis in the spirit of A.H.Müllers Formen-Maxima rule?
15:30 - 16:00	Coffee break / Tea time
16:00 - 17:30	The relationship between population processes (short time scale) and phylogeny (long time scale): with contributions from Ellen Baake* on "The present and the past in mutation-selection models".*
17:30 - 18:30	Current problems in multiple Alignment: with contributions from Burkhard Morgenstern.

	Part II: Workshop Lectures
09:30 - 09:40	Welcome
09:40 - 10:30	Walter Fitch (University of California, USA) Important unsolved mathematical problems about evolution I plan to bring up a set of problems of importance to evolutionists that needs help from the mathematicians.
10:30 - 10:50	Coffee break / Tea time
10:50 - 11:35	Mike Steel (University of Canterbury, New Zealand) Some new results in phylogenetic combinatorics I describe some recent work with Charles Semple on 'phylogenetic clocks' which are a particularly simple type of phylogenetic network possessing some attractive combinatorial properties. I will also describe some recent work with Andreas Dress on the 'path index' of a tree and other combinatorial concepts (such as parsimony) that can be usefully investigated by taking an algebraic perspective.
11:40 - 12:25	David Epstein (University of Warwick, United Kingdom) Tree Space (Joint work with my graduate student Jonathan Ingram.) There is a well-known topology on the space of all trees with given taxa, where each edge has a length. (These are usually called weighted trees.) I will describe this topology and define some metrics on it, including the L^2 metric of Billera, Holmes and Vogtmann. We have not studied these different metrics extensively in a biological context, but it seems as though the L^1 metric is the most biologically informative. To compute the L^1 distance between two trees is computationally trivial. Computing L^2 distances is a difficult problem, and it may turn out to be NP, in one of the many senses of NP. On the other hand, I like the L^2 distance a lot, because it has beautiful mathematical properties (discovered by Billera, Holmes and Vogtmann). I will present a nice algorithm for computing the L^2-distance, much quicker than any naive algorithm, and will sketch the non-trivial proof that the algorithm gives the correct answer. If I can prepare the results before the meeting, I may present some applications in biology of some of our computer programs.
12:30 - 14:15	Lunch Break
14:15 - 15:00	Klaus Scherrer (CNRS, Universités 6 et 7, France) The Genomic "Troika": about the existence and possible origin of 3 inter-dependent but distinct systems of genomic information in eucaryotes
15:00 - 15:30	Guido Fritzsch (Universität Leipzig, Germany) Deep metazoan phylogenies
15:30 - 16:00	Coffee break / Tea time
16:00 - 16:30	Wilfried Imrich (Montanuniversität Leoben, Austria), Sandi Klavzar Median Networks Median networks, alias median graphs, have been rediscovered many times (in different mathematical languages). It is thus clear that they form a useful tool in many areas and are interesting on their own. In the first part of the talk (given by S. Klavzar) we will recall some classical results on median graphs, like the fact that they are precisely the retracts of hypercubes. Then we will follow with some more recent results, for instance the so-called Euler-type formulas and the concept of cube polynomials (and its relation to median graphs). In the second part (by W. Imrich) the focus will be on algorithms for the recognition of median graphs and related classes of graphs. Conceptually simple and easy to implement algorithms, more efficient but still practical ones and algorithms of purely theoretical interest will be treated. The talk ends with a list of open problems.
16:45 - 17:15	Michael Joswig (Technische Universität Berlin, Germany) What are and how do we compute Tropical Polytopes (software for tight spans and other tropical polytopes)? Tropical polytopes have recently been introduced by Develin and Sturmfels. They form a class of polyhedral complexes which generalize tight spans of finite metric spaces (and this is one reason why they are relevant for phylogenetic analysis). This renewed discrete geometry viewpoint on the subject paves the way to employ established algorithmic methods and to use existing software for standard problems in phylogenetics. We report on recent implementations within the open source software framework polymake; see www.math.tu-berlin.de/polymake. These will become available with the new version 2.1, scheduled to be available by the time of this talk.

09:30 - 10:15	Alex Grossmann (Centre national de la recherche scientifique, France) A Report on a Rank-based Analyses of E-Coli and B. subtilis Micro Array (joint work with and ) I shall talk about preliminary results obtained by the examination of Kendall correlation matrices for expression data of E. Coli and B. subtilis.
10:15 - 10:45	Will Casey (University of Warwick, United Kingdom) Algorithms for sequencing the genome Horizontal Gene Flow in Bacteria
10:45 - 11:15	Coffee break / Tea time
11:15 - 11:45	Sonja Prohaska (Universität Leipzig, Germany) Tracking down phylogenetic signals in non-coding cluster sequences of Hox clusters Usually, phylogenetic analyses make use of coding sequences. The reasons are historical ones: for a long time, geneticist focused on the evolution of coding sequences. They collected large amounts of orthologous genes with high levels of homology from distantly related species. Accordingly, the knowledge about evolutionary mechanisms and strategies for multiple alignment and phylogenetic reconstructions were also deduced from gene sequences. With the emergence of non-coding sequences in the databases and adequate alignment tools, one could start to extract phylogenetic signals from orthologous non-coding sequences. These signals are more or less conserved non-coding nucleotides (CNCNs) that occur clustered at so-called phylogenetic footprints. In cases where the gene framework is highly conserved, large amounts of footprints can be extracted from adjacent non-coding sequences. We used phylogenetic signals within the non-coding sequences of Hox clusters to look at sequence phylogenies and to shed light on the sequence of cluster duplication events during vertebrate evolution. Furthermore, we could use the large amounts of data together with statistical methods to study the loss and retention of conserved non-coding regions.
11:45 - 12:15	Martin Middendorf (Universität Leipzig, Germany) Combined phylogenies: hosts and their parasites Hosts and their parasites are prominent model systems for studying coevolutionary processes. A fundamental problem in the theory of comparing host-parasite phylogenies is the reconstruction of past associations between hosts and parasites. Event based methods for solving the reconstruction problem take advantage of knowledge about the likelihood of possible evolutionary events. In this talk, we give a short overview over event-based methods for the reconstruction problem, and present a tool that we have developed for solving this problem when extinction events are possible.
12:15 - 14:15	Lunch Break
14:15 - 15:15	Bernard Comrie (Max Planck Institute for Evolutionary Anthropology, Germany) Problems regarding the "phylogenetic" analysis of language evolution In my brief exposition, I will present (a) some of the reasons why historical-comparative linguists have been reluctant to adopt phylogenetic methods regularly used in other disciplines, and (b) some of the reasons why I think linguistics can benefit from considering such methods and perhaps even contribute to the refinement of such methods. One problem is the pervasiveness of "horizontal transmission", i.e. the effects of language contact, which can make it difficult or even impossible to arrive at a strictly cladistic account of language differentiation. Traditionally, linguists have used painstaking methods that give very accurate answers to certain questions (e.g. relative chronology of phonetic changes), but that leave many questions unanswered, and are (at least possibly) inapplicable beyond a relatively shallow time-depth (say, around 10,000 years). If historical linguistics is to break through this "time barrier", it is essential to consider methods that provide approximations at greater time-depths, even if the results of applying these methods are less solid than those linguists have been accustomed to considering.
15:15 - 15:45	Coffee break / Tea time
15:45 - 16:30	Daniel Huson (Universität Tübingen, Germany) Phylogenetic Networks In 1992, Bandelt and Dress introduced the split decomposition method that takes as input a distance matrix and produces as output a collection of splits that are not necessarily compatible. This method was the motivation for the SplitsTree program, and different versions have been written by Rainer Wetzel, Daniel Huson and David Bryant. In practice, split decomposition has turned out to be very conservative method and it becomes more and more timid as the size of the input set grows. More recent methods such as Neighbor-net, consensus networks, bootstrap networks or the Z-closure network are much more potent at producing splits and this puts new demands on algorithms for representing them. In this talk we will illustrate this problem and discuss some ideas for addressing it.
16:30 - 17:15	Bill Atchley (North Carolina State University, USA) Regulatory Protein Networks I will discuss recent work on the evolution and structure of the MYC-MAX-MAD transcription network.
17:15 - 17:30	Concluding remarks

Participants

Mihai Albu

Universität Bielefeld

Bill Atchley

North Carolina State University, Raleigh, North Carolina

Ellen Baake

Universität Wien

Hans Binder

Universität Leipzig

Andrej Borissenko

Universität Bielefeld

David Bryant

McGill University, Montreal

Will Casey

University of Warwick

Dietmar Cieslik

Universität Greifswald

Bernard Comrie

Max Planck Institute for Evolutionary Anthropology, Leipzig

Tobias Dezulian

Universität Tübingen

David Epstein

University of Warwick

Peter Erdõs

Hungarian Academy of Sciences, Budapest

Walter Fitch

University of California, Irvine

Hans-Joachim Fritz

Universität Göttingen

Guido Fritzsch

Universität Leipzig

Georg Fuellen

Universität Münster

Alex Grossmann

Centre national de la recherche scientifique, Paris

Stefan Grünewald

University of Canterbury

Arndt von Haeseler

Universität Düsseldorf & Forschungszentrum Jülich

Stefan Henz

Max-Planck-Institut für Entwicklungsbiologie, Tübingen

Katharina Huber

Uppsala University and Swedish University of Agricultural Sciences

Daniel Huson

Universität Tübingen

Wilfried Imrich

Montanuniversität Leoben

Michael Joswig

Technische Universität Berlin

Sandi Klavzar

University of Maribor

Tobias Kloepper

Universität Tübingen

Eduardo Mendoza

Ludwig-Maximilians-Universität, München

Martin Middendorf

Universität Leipzig

Kristin Missal

Universität Leipzig

Burkhard Morgenstern

Universität Göttingen

Vincent Moulton

Uppsala University

Kay Nieselt

Universität Tübingen

Sonja Prohaska

Universität Leipzig

Klaus Scherrer

CNRS, Universités 6 et 7, Paris

Eberhard Schmitt

Institut für Molekulare Biotechnologie, Jena

Mike Steel

University of Canterbury, Christchurch

Dietrich Stoyan

TU Bergakademie Freiberg

Andrea Tanzer

Universität Leipzig

Armin Uhlmann

Universität Leipzig

Søren Wichmann

Max Planck Institute for Evolutionary Anthropology, Leipzig

Organizers

Andreas Dress

Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig

Jürgen Jost

Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig

Peter Stadler

Leipzig University

Administrative Contact

Antje Vandenberg

Max-Planck-Institut für Mathematik in den Naturwissenschaften Contact via Mail