<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Archetype &#187; Direct optimization</title>
	<atom:link href="http://roberto.kellerperez.com/tag/direct-optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://roberto.kellerperez.com</link>
	<description>Ant reconstruction one homology at a time</description>
	<lastBuildDate>Tue, 21 Dec 2010 16:19:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Phylogenetics through videoconferencing</title>
		<link>http://roberto.kellerperez.com/2009/12/phylogenetics-through-videoconferencing/</link>
		<comments>http://roberto.kellerperez.com/2009/12/phylogenetics-through-videoconferencing/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 17:11:43 +0000</pubDate>
		<dc:creator>Roberto Keller</dc:creator>
				<category><![CDATA[Cladistics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[Direct optimization]]></category>
		<category><![CDATA[Dynamic homology]]></category>
		<category><![CDATA[Frederick Matsen]]></category>
		<category><![CDATA[phyloseminar]]></category>
		<category><![CDATA[POY]]></category>
		<category><![CDATA[Ward Wheeler]]></category>

		<guid isPermaLink="false">http://roberto.kellerperez.com/?p=1882</guid>
		<description><![CDATA[Last night I attended a talk in Lisbon given by Ward Wheeler at the AMNH in New York City and moderated by Frederick Matsen from his home institution in Berkeley, California. The talk was the second on a series of talks in phylogenetics held via videoconferencing. The idea behind phyloseminar.org is to hold regular live [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1897" title="phyloseminar1" src="http://roberto.kellerperez.com/wp-content/uploads/2009/12/phyloseminar1.jpg" alt="phyloseminar1" width="267" height="63" />Last night I attended a talk in Lisbon given by <a href="http://research.amnh.org/scicomp/ward_wheeler.html">Ward Wheeler</a> at the AMNH in New York City and moderated by Frederick Matsen from his home institution in Berkeley, California. The talk was the second on a series of talks in phylogenetics held via videoconferencing.</p>
<p>The idea behind <a href="http://phyloseminar.org/">phyloseminar.org</a> is to hold regular <em>live</em> online seminars in phylogenetic methodology open to anyone around the globe. This is a challenge given the time zone differences of the possible participants, but it does makes the whole event fun: I watched it after dinner at 9:00pm; the presenter gave it at his 4:00pm; while the moderator was there after lunch at his 1:00pm. I saw at least one person among the audience that watched it from the future after breakfast in New Zealand the next day at 10:00am.<span id="more-1882"></span></p>
<p>We used the software <a href="http://evo.caltech.edu/evoGate/">EVO</a>, a free tool specifically designed for scientific communication (unlike the Internets that was designed for&#8230; nevermind). Prior to a seminar, you need to install and create an user account so you can then join the phyloseminar channel. It works really well. You see a window with the slideshow and a window with video stream for each participant (to keep things simpler, only the presenter and the moderator had video enabled last night).</p>
<p>For Wheeler&#8217;s talk we were twelve people, and looking at their user accounts (where you can set your location), there were people listening in California, Kansas, New York, Lisbon and New Zealand at least. The talk was 45 minutes long and went on for another 15 minutes of discussion. We could type questions using the chat tool of the software, which were then read by the moderator (again, rather than each person talking to keep things simpler).</p>
<div id="attachment_1899" class="wp-caption aligncenter" style="width: 510px"><img class="size-full wp-image-1899" title="phyloseminar2" src="http://roberto.kellerperez.com/wp-content/uploads/2009/12/phyloseminar2.jpg" alt="Looking right into Wheeler's desktop." width="500" height="530" /><p class="wp-caption-text">Looking right into Wheeler&#39;s desktop.</p></div>
<p>Wheeler&#8217;s talk, <em>Dynamic homology and phylogenetic systematics</em>, was about alignment, or rather methods to avoid having to perform an alignment for phylogenetic inference altogether, something he has been championing for many years now. The idea behind these methods, called <em>direct optimization</em> methods, is easy to understand: when you are comparing DNA sequences in order to reconstruct how species (or genes) are related to each other, you need to match them together to determine which positions along a sequence correspond to which positions in another one, a process called <a href="http://en.wikipedia.org/wiki/Sequence_alignment">sequence alignment</a>. Only then can you asses whether different species have the same or a different base composition in each position&#8211; the raw evidence for evolutionary relatedness. But it happens that, because those sequences are the result of a process of mostly branching evolution (where one species splits to gives rise to two descendant ones), the proper format for comparison between multiple sequences is not a matrix of rows and columns but a phylogenetic tree. The problem is that we don&#8217;t know the shape of this tree because that is what we seek to reconstruct in the first place.</p>
<p>The most common way to address this problem is to perform alignments using tree shapes that we know are a good approximations. Once we find a satisfactory match between our sequences, we proceed with the phylogenetic reconstruction proper, searching for the tree(s) that maximizes our optimality criterion (e.g., parsimony, likelihood). But one caveat of the procedure I just caricatured is that by running the analysis in two steps (alignment and tree search), you impose a restriction on the number of possible combinations you will evaluate. Direct optimization lifts this restriction by performing the sequence matching and tree evaluation in just one step, with the potential result that you may find more optimal solutions. In other words, direct optimization methods are able to perform more thorough exploration of the space of possible solutions.</p>
<p>Now, while the method is easy enough to describe in a post, its mathematical and computational implementation is not simple at all. The amount of operations needed to evaluate just a single tree shape increases exponentially in comparison with vanilla tree searches, and you will be better off performing these calculations in a computer cluster.</p>
<p>The aspect that caused more unease during the talk was Wheeler&#8217;s explanation of the difference between truth and optimality inherent in all these methods (direct optimization or not). Apparently, when you simulate sequence data in order to run it through different programs and evaluate how well each alignment methods does, they all invariably find solutions that are more optimal (more parsimonious, more likely or more probable) than the simulated one. That is, most of the time the optimal solution is different from the true one. The consequence of this is that, since in phylogenetic reconstruction we will never know for certain the true evolutionary history, we are forced to abandon the search for the true solution and will have to content with finding the optimal one.</p>
<p>If this talk was representative of the series, I said the seminars are not for the general audience: you needed a very good grasp of phylogenetic theory; alas, if you know Ward Wheeler you know that his brain runs as fast as his supercomputers. A good thing is that the seminars are being recorded and can be revisited anytime. You can watch the first one by <a href="http://phyloseminar.org/recorded.html">Marc A. Suchard here</a> and the one by Ward Wheeler there soon. The next seminar will address the problem of reconciling gene tress with species trees, and the next three seminars are decided by <a href="http://phyloseminar.org/vote.html">popular vote</a>.</p>
<p>The whole experience was a first for me, and it was real fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://roberto.kellerperez.com/2009/12/phylogenetics-through-videoconferencing/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The evolution of the web</title>
		<link>http://roberto.kellerperez.com/2009/04/the-evolution-of-the-web/</link>
		<comments>http://roberto.kellerperez.com/2009/04/the-evolution-of-the-web/#comments</comments>
		<pubDate>Fri, 03 Apr 2009 13:43:01 +0000</pubDate>
		<dc:creator>Roberto Keller</dc:creator>
				<category><![CDATA[Cladistics]]></category>
		<category><![CDATA[Phylogeny]]></category>
		<category><![CDATA[Direct optimization]]></category>
		<category><![CDATA[Implied weights]]></category>

		<guid isPermaLink="false">http://roberto.kellerperez.com/?p=696</guid>
		<description><![CDATA[Spider web, that is. This is an excellent example of the way systematic papers should be. In the latest issue of the Proceedings of the National Academy of Sciences (USA), Blackledge and coworkers assembled a comprehensive data set for cladistic analysis of orb web spiders that includes six different molecular loci, 143 morphological characters and [...]]]></description>
			<content:encoded><![CDATA[<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img style="border:0;" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" alt="ResearchBlogging.org" /></a></span>Spider web, that is.</p>
<p>This is an excellent example of the way systematic papers should be. In the latest issue of the <a href="http://www.pnas.org/">Proceedings of the National Academy of Sciences</a> (USA), <a href="http://www.pnas.org/content/106/13/5229.abstract">Blackledge and coworkers</a> assembled a comprehensive data set for cladistic analysis of orb web spiders that includes six different molecular loci, 143 morphological characters and behavior in the form of characters derived from web architecture.</p>
<p><span id="more-696"></span></p>
<p style="text-align: center;">
<div id="attachment_698" class="wp-caption aligncenter" style="width: 537px"><img class="size-full wp-image-698" title="Blackledge et al. 2009 - figure 2" src="http://roberto.kellerperez.com/wp-content/uploads/2009/04/blackledge_etal2009fig2.jpg" alt="Hypothesis of web architecture evolution as optimized in preferred phylogeny." width="527" height="671" /><p class="wp-caption-text">Hypothesis of web architecture evolution as optimized in preferred phylogeny (from Blackledge et al. 2009: fig. 2).</p></div>
<p>The resulting picture supports a single origin of aerial orb webs from irregular webs constructed in the ground. There is a subsequent evolution to more economical, irregular aerial web architectures from the more costly, regular orb types at least three times independently. And there seems to be an instance of evolution towards the simplified aerial web spun by bolas spiders.</p>
<p>However, apart from the nice evolutionary story, the real treat in this paper is hidden in the small text and supplementary information. All the data compiled would have been useless if analysed with poor methods. Instead the authors performed a series of sophisticated phylogenetic techniques that, besides vanilla Bayesian and parsimony analyzes, included implied weighting for the morphology partition and direct optimization for the molecular data. Morphology and molecular data were analyzed separately and in combination for an impressive total of 64 different types of phylogenetic analyzes.</p>
<p>Implied weights and direct optimization analyzes are worth remarking here because they are not well known and still rarely used in <span style="text-decoration: line-through;">flashy</span> high ranked papers. While in a regular parsimony analysis all characters are given equal weights regardless of how well or how poorly they fit a given tree, in implied weights analysis characters are downweighted as a function of the amount of homoplasy (extra steps) that is required to explain their distribution on any given tree topology during the tree-search phase. It is an <em>a posteriori</em> type of character weighting. One of the rationales behind this methods is that one extra step in a character that already performs very poorly (is very homoplasious), should not be counted equally as one extra step in a character with almost perfect fit. The method was developed in the early 1990&#8242;s but is has had a resurrection of late for morphological data, curiously because with the rise of molecular analyzes many authors have noticed that the results of analyzes of morphology under implied weights mirror more closely the molecular phylogenies.</p>
<p>For the molecular data, the use of direct optimization techniques is simply a way to push the limits on finding the optimal correspondences among the positions of DNA sequences under comparison. <a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment">Multiple sequence alignment</a> methods tend to find just one of the many possible optimal solution (if not suboptimal) from which a regular phylogenetic analyzes (parsimony, maximum likelihood or Bayesian) is then performed during a second phase. In contrast, direct optimization side-steps alignment altogether by searching for the optimal correspondences among sequences during tree search in a simultaneous single step process, thus performing a much more aggressive evaluation of the multiple possible alternatives. It is computational intensive, but this is hardly an excuse for good science as exemplified in this paper.</p>
<p>The result of applying these methods is a well supported phylogeny that allow the authors to make a rigorous reconstruction of the evolution of spider silk, bringing together information from silk chemistry, spider morphology and behavioral ecology.</p>
<p>Don&#8217;t forget to take a look at the supplementary information. There are lots of nice pictures showing all the types of web architectures. Oh, and its open access, so no subscription required.</p>
<p><strong>Reference</strong></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences&amp;rft_id=info%3Adoi%2F10.1073%2Fpnas.0901377106&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Reconstructing+web+evolution+and+spider+diversification+in+the+molecular+era&amp;rft.issn=0027-8424&amp;rft.date=2009&amp;rft.volume=106&amp;rft.issue=13&amp;rft.spage=5229&amp;rft.epage=5234&amp;rft.artnum=http%3A%2F%2Fwww.pnas.org%2Fcgi%2Fdoi%2F10.1073%2Fpnas.0901377106&amp;rft.au=Blackledge%2C+T.&amp;rft.au=Scharff%2C+N.&amp;rft.au=Coddington%2C+J.&amp;rft.au=Szuts%2C+T.&amp;rft.au=Wenzel%2C+J.&amp;rft.au=Hayashi%2C+C.&amp;rft.au=Agnarsson%2C+I.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CTaxonomy%2C+Zoology%2C+Phylogeny%2C+Evolutionary+Biology">Blackledge, T., Scharff, N., Coddington, J., Szuts, T., Wenzel, J., Hayashi, C., &amp; Agnarsson, I. (2009). Reconstructing web evolution and spider diversification in the molecular era <span style="font-style: italic;">Proceedings of the National Academy of Sciences, 106</span> (13), 5229-5234 DOI: <a rev="review" href="http://dx.doi.org/10.1073/pnas.0901377106">10.1073/pnas.0901377106</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://roberto.kellerperez.com/2009/04/the-evolution-of-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

