<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Archetype &#187; Willi Hennig Society</title>
	<atom:link href="http://roberto.kellerperez.com/tag/willi-hennig-society/feed/" rel="self" type="application/rss+xml" />
	<link>http://roberto.kellerperez.com</link>
	<description>Ant reconstruction one homology at a time</description>
	<lastBuildDate>Tue, 21 Dec 2010 16:19:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Bigger is better: the largest phylogenetic tree reconstructed.</title>
		<link>http://roberto.kellerperez.com/2009/05/bigger-is-better-the-largest-phylogenetic-tree-reconstructed/</link>
		<comments>http://roberto.kellerperez.com/2009/05/bigger-is-better-the-largest-phylogenetic-tree-reconstructed/#comments</comments>
		<pubDate>Sun, 03 May 2009 11:45:18 +0000</pubDate>
		<dc:creator>Roberto Keller</dc:creator>
				<category><![CDATA[Cladistics]]></category>
		<category><![CDATA[Personalities]]></category>
		<category><![CDATA[Phylogeny]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[GenBank]]></category>
		<category><![CDATA[James Farris]]></category>
		<category><![CDATA[Pablo Goloboff]]></category>
		<category><![CDATA[TNT]]></category>
		<category><![CDATA[Willi Hennig Society]]></category>

		<guid isPermaLink="false">http://roberto.kellerperez.com/?p=853</guid>
		<description><![CDATA[GenBank, the standard database for genetic information maintained by National Center for Biotechnology Information, has been accumulating DNA sequences for some three decades now. Since its creation in the late 1980s, it has become the de facto repository for genetic information&#8211; genetic data must now be submitted to GenBank for a paper to be accepted [...]]]></description>
			<content:encoded><![CDATA[<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img style="border:0;" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" alt="ResearchBlogging.org" /></a></span><a href="http://www.ncbi.nlm.nih.gov/Genbank/index.html">GenBank</a>, the standard database for genetic information maintained by <a href="http://www.ncbi.nlm.nih.gov/">National Center for Biotechnology Information</a>, has been accumulating DNA sequences for some three decades now. Since its creation in the late 1980s, it has become the <em>de facto</em> repository for genetic information&#8211; genetic data must now be submitted to GenBank for a paper to be accepted for publication. Most sequence data accumulated are the result of the sum of many &#8220;local&#8221; taxonomic studies that have targeted a particular group of organism for a relatively small, but well-known collection of genes. It contents now span over hundreds of genes across all of life&#8217;s domains. So, what would happen if you were to take all the sequence information contained in GenBank and analyze it phylogenetically all together in a single, one-step study? Well, that is what Pablo A. Goloboff and coworkers just did, the results of which were published in last week&#8217;s online early edition of <a href="http://www3.interscience.wiley.com/journal/118512781/home">Cladistics</a>, the international journal of the <a href="http://www.cladistics.org/">Willi Hennig Society</a>.</p>
<p><span id="more-853"></span></p>
<p>The phylogenetic analysis comprises an astonishing 73,060 terminal eukaryotic taxa, 9535 molecular characters and, for good measure, they threw in 604 morphological characters. It is therefore the largest phylogenetic analysis published to date and almost six times larger than the former world record. Such feat presented many technical challenges. The logistics required the automatizing of every step in the analysis, via computer scripts, to retrieve and sort thousands of GenBank entries, to align the sequences to construct the data matrix, to perform the actual searches for the optimal solutions, and to interpretation of the mammoth-size phylogenetic trees. The crux of the analysis, the search for the optimal phylogenetic trees, was done with the powerful parsimony phylogenetic program <a href="http://www.zmuc.dk/public/phylogeny/TNT/">TNT</a> running in parallel in three multi-processor computers for 2.5 months.</p>
<div id="attachment_879" class="wp-caption aligncenter" style="width: 407px"><img class="size-full wp-image-879" title="nf1" src="http://roberto.kellerperez.com/wp-content/uploads/2009/05/nf1.gif" alt="nf1" width="397" height="666" /><p class="wp-caption-text">Fig. 1. Pruned strict consensus tree for the combined data set (seven trees, 1879 taxa excluded). The bar shows the span of 5000 species.</p></div>
<p>The resulting phylogeny recovers most traditional taxonomic groups. This is interesting for various reasons. First, as noted about, our understanding of the tree of life is the results of many taxonomically localized efforts that have been informally pasted together<sup class='footnote'><a href='#fn-853-1' id='fnref-853-1'>1</a></sup>. This is the first time a phylogeny has been reconstructed from scratch, letting the data speak unconstrained for itself without assuming that certain evolutionary relationships most be true <em>a priori</em>. Second, it shows that there is enough historical information contained in the data so that the optimal solution is not a complete mess or largely unresolved answer&#8211; consider that there are 9 X 10<sup>345,593</sup> possible tree combinations for the number of terminals included. Third, that we do have the current capacity, both in terms of software and hardware, to carry out such a large analysis. And last, but related to the previous two points, that parsimony methods for phylogenetic reconstruction are up for the task. The latter point is worth noting because early simulations, based on just a few taxa (a grand total of four actually) scared systematists into thinking that parsimony methods may result in erroneous reconstructions. Later studies using real data and a much larger collection of species has shown that this is not the case, and this 73,060 taxa analysis serves as the largest of these test cases.</p>
<p>The authors are no strangers when it comes to computer implementation of phylogenetic methods. <a href="http://www.nrm.se/en/menu/researchandcollections/departments/molecularsystematics/staff/jamesstevenfarris.1179_en.html">James S. Farris</a> is a pioneer in the field who developed the algorithmic foundations and produced the some of the first phylogenetic programs in the late 1960s, when the character information for each taxon to be analyzed was contained in a <a href="http://en.wikipedia.org/wiki/Punch_card">punch card</a> and random addition sequence for the phylogenetic tree construction meant that the set of cards was shuffled by hand before feeding them into the terminal connected to the mainframe. Likewise, Pablo A. Goloboff has been responsible for many of the rapid search techniques developed during the 1990s up to the present, that seek to cover the searchable tree-space in a fast and efficient way.</p>
<p>It seems that, for phylogenetics, the only limit that remains is the availability of data.</p>
<p><strong>References and notes</strong></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Cladistics&amp;rft_id=info%3Adoi%2F10.1111%2Fj.1096-0031.2009.00255.x&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Phylogenetic+analysis+of+73+060+taxa+corroborates+major+eukaryotic+groups&amp;rft.issn=07483007&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=0&amp;rft.epage=0&amp;rft.artnum=http%3A%2F%2Fblackwell-synergy.com%2Fdoi%2Fabs%2F10.1111%2Fj.1096-0031.2009.00255.x&amp;rft.au=Goloboff%2C+P.&amp;rft.au=Catalano%2C+S.&amp;rft.au=Marcos+Mirande%2C+J.&amp;rft.au=Szumik%2C+C.&amp;rft.au=Salvador+Arias%2C+J.&amp;rft.au=K%C3%A4llersj%C3%B6%2C+M.&amp;rft.au=Farris%2C+J.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science%2CTaxonomy%2C+Phylogeny%2C+Cladistics">Goloboff, P., Catalano, S., Marcos Mirande, J., Szumik, C., Salvador Arias, J., Källersjö, M., &amp; Farris, J. (2009). Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups <span style="font-style: italic;">Cladistics</span> DOI: <a rev="review" href="http://dx.doi.org/10.1111/j.1096-0031.2009.00255.x">10.1111/j.1096-0031.2009.00255.x</a></span></p>
<div class='footnotes'>
<div class='footnotedivider'></div>
<ol>
<li id='fn-853-1'>Only more recently we have the development of &#8220;supertree&#8221; methods, that seek to construct a large phylogeny based on the consensus of multiple small, partially overlapping, trees following more precise set of rules. <span class='footnotereverse'><a href='#fnref-853-1'>&#8617;</a></span></li>
</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://roberto.kellerperez.com/2009/05/bigger-is-better-the-largest-phylogenetic-tree-reconstructed/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

