Tarzan - Phylogeniesoftware zur Ermittlung von Cophylogenien

Cophylogenie von Wirt- und Parasitbämen
Cophylogenie von Gen- und Speziesbäumen

Tarzan Entwickler-Team: Steffen Junick, Daniel Merkle, Martin Middendorf
(Ein Dank geht an Roman Legat für seine Arbeit an einer ersten Version von Tarzan)

Der Name Tarzan ist inspiriert durch die Tatsache, dass Tarzan Rekonstruktionen als Teilbäume einer Datenstruktur findet, die Assoziationen von Knoten im Parasitenbaum mit Knoten oder Kanten im Witrsbaum enthält (eine ähnliche Datenstruktur wird vom Programm TreeMap verwendet und wurde von M.A. Charleston [Math. Biosciences, 149 (1998)] Jungle genannt.


Screenshot showing the four main windows of Tarzan. Shown is a cost minimal reconstruction for a small gopher lice example.

Screenshot showing the shifting of a switch due to the ranks of the nodes. Shifted switches are drawn pink. The landing site is shifted, such that parasite 5 lands before node 25. Landing on the edge between node 25 and 27 is not allowed.

Short description:
Tarzan uses an event-based method to find cost minimal or reconstructions or reconstructions that have a minimal (or maximal) number of certain evolutionary events.
Five different types of evolutionary events are considered: cospeciation, duplication, sorting, switching, and extinction. For host parasite systemes cospeciation events refer to simultaneous host and parasite speciation, duplication events are independent parasite speciations, sorting events correspond to lineage sorting, and switches correspond to host shifts.

Tarzan has a graphical user interface that consists of the following four main windows.

1. Tree editor window: to define and edit interactively the phylogenetic trees; nodes of the trees can be labelled, e.g., with corresponding species names; divergence times can be defined by a time zone labelling for one tree and a time interval labelling for the other tree; mapping function Phi defines the current relations between the leaves of one tree and nodes of the other tree can simply be defined by drawing lines between the related nodes; lternatively, the trees, their names, the divergence time information, and the mapping function can also be defined by modifying a corresponding text file.

2. Association triple viewer: shows the candidate data structure containing the association triples (can be calculated after the phylogenetic trees and the mapping function have been defined).

3. Reconstruction table window: shows the calculated reconstruction with the number of different types of events and the resulting costs (after the event costs have been set); can show all reconstructions or only the cheapest reconstructions.

4. Reconstruction viewer window: by double clicking a row in the reconstruction table window the corresponding reconstruction is depicted in this window (moreover, in the association triple viewer all associations triples used for the reconstruction are marked). The listed reconstructions ca be sorted with respect to costs or number of the diferetn events (by clicking on the corresponding column head)

Note: Switches can lead to timing incompatibilities within a reconstruction. Therefore, Tarzan automatically checks every reconstruction for switch incompatibilities and tries to resolve them by pulling back the landing site of switches so that only a minimal number of sortings have to be introduced. But because the corresponding problem is NP-complete and to have a fast tool it is not guaranteed that Tarzan can resolve all incompatibilities (see the paper for more details). Incompatibilities between switches that have been resolved and the corresponding possible move back operations are listed by Tarzan.

Additional features of Tarzan are:
i) Tarzan offers not only the possibility to compute any cheapest reconstruction but can also compute reconstruction that are computed to other criteria. Moreover, a hierarchy of criteria can be defined (in that case optimization is done first with respect to the most important criterion, then within all found optimimal solutions optimization is done with respect to the second highest criterion and so forth. Possible criteria are (in each case minimum or a maximum is possible): cost, number of cospeciations, number of dupliciations, number of sortings, number of host switches, number of extictions
ii) The maximal number of cheapest reconstructions that are be computed by Tarzan can be set by the user.
iii) Tarzan can also list all possible reconstructions which could be interesting for cases where not too many reconstructions exist.