Topological Data Analysis (TDA) allows us to extract powerful topological and higher-order information on the global shape of a data set or point cloud. Tools like Persistent Homology or the Euler Transform give a single complex description of the global structure of the point cloud. However, sometimes, we are interested in translating this global information back to local node-level features, where the individual points have some real-world meaning.
This talk will be about how we can achieve this using the Hodge Laplacian and concepts from TDA, Topological Signal Processing, and Differential Geometry. I will also talk about what we can learn from persistent homology by varying the distance function on the underlying space and analysing the corresponding shifts in the persistence diagrams.
Genes evolve within their species by means gene duplication, gene loss, and occasional horizontal gene transfer, leading to a family of related genes distributed over a set of species. In each speciation event, all genes are faithfully transmitted into the separating lineages. As long as horizontal transfer is the exception rather than the rule for most gene families, species are well-defined and, like their genes, evolve along trees. The history of a gene family is then defined as the mapping of the gene tree $T$ into the species tree $S$, such that inner vertices that represent speciations in $T$ are mapped to inner vertices of $T$. Such evolutionary scenarios can be used to defined several useful vertex-colored graphs that represent partial information on the gene family history. For example, in the orthology graph two vertices are adjacent if the corresponding genes have a speciation event as their last common ancestor. In the best match graph, a directed edge connects $x$ and $y$ from different species, if $y$ is, in its species, a closest relative of $x$. In the LDT graph, $x$ and $y$ are adjacent, if their last common ancestor is younger than the last common ancestor of the species in which they reside. The interest in these and other graphs stems from the fact that they can inferred more or less directly from sequence similarity data without the need to construct the gene tree $T$ or the species $S$. We discuss to what extent gene family histories are determined by these graphs. Since these graphs have very specific mathematical structures, correcting empirical estimates to conform to these structures provides a powerful way of reducing noise in the data.
Taken together this suggest a graph-based approach to gene family histories that avoids many of the practical issues with classical phylogenetic approaches that require accurate gene and species trees as a first step.
Tandem duplication random loss (TDRL) and inverse tandem duplication random loss (iTDRL) are mechanisms of mitochondrial genome rearrangement that can be modeled as simple operations on signed permutations. Informally, they comprise the duplication of a subsequence of a permutation, where in the case of iTDRL the copy is inserted with inverted order and signs. In the second step, one copy of each each duplicate element is removed, such that the result is again a signed permutation. The TDRL/iTDRL sorting problem consists in finding the minimal number of TDRL or iTDRL operations necessary to convert the identity permutation $\iota$ into a given permutation $\pi$. We introduce a simple signature, called the misc-encoding, of permutation $\pi$. This construction is used to design an $\mathcal{O}(n\log n)$ algorithm to solve the TDRL/iTDRL sorting problem.