Archetype

Ant reconstruction one homology at a time

  • home
  • About
  • Image Use
  • Archive

Bigger is better: the largest phylogenetic tree reconstructed.

Sunday, May 3rd, 2009 | Cladistics, Personalities, Phylogeny, Uncategorized

ResearchBlogging.orgGenBank, the standard database for genetic information maintained by National Center for Biotechnology Information, has been accumulating DNA sequences for some three decades now. Since its creation in the late 1980s, it has become the de facto repository for genetic information– genetic data must now be submitted to GenBank for a paper to be accepted for publication. Most sequence data accumulated are the result of the sum of many “local” taxonomic studies that have targeted a particular group of organism for a relatively small, but well-known collection of genes. It contents now span over hundreds of genes across all of life’s domains. So, what would happen if you were to take all the sequence information contained in GenBank and analyze it phylogenetically all together in a single, one-step study? Well, that is what Pablo A. Goloboff and coworkers just did, the results of which were published in last week’s online early edition of Cladistics, the international journal of the Willi Hennig Society.

The phylogenetic analysis comprises an astonishing 73,060 terminal eukaryotic taxa, 9535 molecular characters and, for good measure, they threw in 604 morphological characters. It is therefore the largest phylogenetic analysis published to date and almost six times larger than the former world record. Such feat presented many technical challenges. The logistics required the automatizing of every step in the analysis, via computer scripts, to retrieve and sort thousands of GenBank entries, to align the sequences to construct the data matrix, to perform the actual searches for the optimal solutions, and to interpretation of the mammoth-size phylogenetic trees. The crux of the analysis, the search for the optimal phylogenetic trees, was done with the powerful parsimony phylogenetic program TNT running in parallel in three multi-processor computers for 2.5 months.

nf1

Fig. 1. Pruned strict consensus tree for the combined data set (seven trees, 1879 taxa excluded). The bar shows the span of 5000 species.

The resulting phylogeny recovers most traditional taxonomic groups. This is interesting for various reasons. First, as noted about, our understanding of the tree of life is the results of many taxonomically localized efforts that have been informally pasted together1. This is the first time a phylogeny has been reconstructed from scratch, letting the data speak unconstrained for itself without assuming that certain evolutionary relationships most be true a priori. Second, it shows that there is enough historical information contained in the data so that the optimal solution is not a complete mess or largely unresolved answer– consider that there are 9 X 10345,593 possible tree combinations for the number of terminals included. Third, that we do have the current capacity, both in terms of software and hardware, to carry out such a large analysis. And last, but related to the previous two points, that parsimony methods for phylogenetic reconstruction are up for the task. The latter point is worth noting because early simulations, based on just a few taxa (a grand total of four actually) scared systematists into thinking that parsimony methods may result in erroneous reconstructions. Later studies using real data and a much larger collection of species has shown that this is not the case, and this 73,060 taxa analysis serves as the largest of these test cases.

The authors are no strangers when it comes to computer implementation of phylogenetic methods. James S. Farris is a pioneer in the field who developed the algorithmic foundations and produced the some of the first phylogenetic programs in the late 1960s, when the character information for each taxon to be analyzed was contained in a punch card and random addition sequence for the phylogenetic tree construction meant that the set of cards was shuffled by hand before feeding them into the terminal connected to the mainframe. Likewise, Pablo A. Goloboff has been responsible for many of the rapid search techniques developed during the 1990s up to the present, that seek to cover the searchable tree-space in a fast and efficient way.

It seems that, for phylogenetics, the only limit that remains is the availability of data.

References and notes

Goloboff, P., Catalano, S., Marcos Mirande, J., Szumik, C., Salvador Arias, J., Källersjö, M., & Farris, J. (2009). Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups Cladistics DOI: 10.1111/j.1096-0031.2009.00255.x

  1. Only more recently we have the development of “supertree” methods, that seek to construct a large phylogeny based on the consensus of multiple small, partially overlapping, trees following more precise set of rules. ↩
Share/Save

Tags: GenBank, James Farris, Pablo Goloboff, TNT, Willi Hennig Society

7 Comments to Bigger is better: the largest phylogenetic tree reconstructed.

1
The Red Notebook » Waddya know? It’s a tree!
May 3, 2009

[...] Meanwhile, the National Center for Biotechnology Information, which has been amassing genetic sequences for three decades recently published the largest phylogenetic tree ever constructed. [...]

2
David Marjanovi?
May 4, 2009

That branch between Mammalia and Lepidosauria is intriguing. Does it represent the turtles?

This is the first time a phylogeny has been reconstructed from scratch, letting the data speak unconstrained for itself without assuming that certain evolutionary relationships most be true a priori.

Er… no. What do you mean? I don’t get it. In every phylogenetic analysis, the only assumption is that the ingroup is monophyletic with respect to the outgroup; and this assumption was made here, too.

3
David Marjanović
May 4, 2009

Stupid Safari.

4
Roberto Keller
May 4, 2009

In every phylogenetic analysis, the only assumption is that the ingroup is monophyletic with respect to the outgroup; and this assumption was made here, too.

You are right, ingroup monophyly with respect to the outgroup is the minimal assumption required to add directionality (rooting) to the tree. It is thus an inescapable parameter, common to all phylogenetic analyzes, including this one.

Still the tree by Goloboff and coworkers can be seen as the result of a global, unconstrained analysis when compared againts all the numerous local efforts produced over the years that form the multiple pieces of the Tree of Life puzzle as we know it.

5
Marcos
May 6, 2009

Yes, this clade is composed by the 189 turtles we analyzed, forming the sister group of Lepidosauria + Archosauria (the untagged sister group of Aves are the crocs).

6
Gunnar
July 30, 2009

Would it be possible to publish the final tree online to make further investigation possible? It would interest me very much to look at the inferred Arthropod relationships.

7
Roberto Keller
July 31, 2009

Gunnar- I don’t have access to a high resolution version of this tree, but you should definitely request such figure to one of the original authors of the paper at his blog.

Subscribe: Entries | Comments
And as we discussed last semester, the Army Ants will leave nothing but your bones.
- Tom Waits

Search

Locations of visitors to this page

Recent Comments

  • Robert Fuentealba. on About
  • Robert Fuentealba. on About
  • Raúl Martínez on About
  • Yannick Wurm on This blog is now closed
  • Alex Wild on This blog is now closed

Tags

Abdomen Acropyga Alfred Russel Wallace Amblyoponinae AMNH Ants ant taxonomy Apocrita Arolia Arolium Charles Darwin Clypeus Concoctio Creative Commons Direct optimization Donat Agosti Dorylus Essentialism EvoDevo Gerontoformica Homology IUSSI Labrum Leptanilloides Mandibles Manica rubida Martialis Meetings Mesosoma Mexico Morphology Mouthparts NHM Oecophylla smaragdina Onychomyrmex Open access Phylogeny phyloseminar Prognathous Richard Owen Science Commons Setae Tetraponera aethiops Tetraponera attenuata Typological thinking

Blogroll

  • 2D Goggles
  • Ant Blog
  • antbase
  • Apoica
  • Catalogue of Organisms
  • Computer cladistics / ¡Cladística a la lata!
  • Creature Cast
  • Evolving Thoughts
  • HAO
  • Historias de hormigas
  • HMD
  • I Love Insects
  • iPhylo
  • Macromite’s Blog
  • Myrmecoid
  • Myrmecos Blog
  • myrmician
  • Pharyngula
  • Photo Synthesis
  • SciencePunk
  • Sifolinia’s AntBlog
  • Systematics and Biogeography
  • The Ant Room
  • The Dragonfly Woman
  • The Lancelet
  • The Rough Guide to Evolution
  • Vince Smith blogs

Links

  • Abouheif Lab
  • American Museum of Natural History
  • Ant Genomics
  • antbase.org
  • antweb.org
  • Asociación Ibérica de Mirmecología
  • Biodiversity Heritage Library
  • Comparative Morphology & Development (CSZ)
  • filogenética.org
  • formicidae.org
  • International Society of Hymenopterists
  • Miller Lab – Insect Systematics
  • Morphbank
  • MorphoBank
  • Plazi
  • Richard Dawkins
  • Social Wasps
  • Systematics Association
  • TNT wiki
  • Willi Hennig Society
  • ZooBank
Get your own free Blogoversary button! Add to Technorati Favorites
Follow this blog

Recent Posts

  • This blog is now closed
  • Croatian Myrmecological Society
  • Merriam-Webster on cladistics
  • From the archive
  • A blog on social wasps and life

Archives

  • December 2010
  • August 2010
  • June 2010
  • May 2010
  • April 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

Categories

The Out Campaign: Scarlet Letter of Atheism
Archetype is powered by WordPress.
WordPress Themes by webdemar.
Creative Commons License
Archetype by Roberto Keller is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.