Search
Advanced Search
Metrics info
Average Rating (0 User Ratings)
    • Currently 0/5 Stars.
    See all categories
      • Currently 0/5 Stars.
      • Currently 0/5 Stars.
      • Currently 0/5 Stars.
    Rate This Article
Share this Article info
  • Bookmark: StumbleUpon Facebook Connotea CiteULike Bibliography

Open Access

Research Article

Gene Family Evolution across 12 Drosophila Genomes

Author Summary<p>Though comparative genome sequencing has revealed vast similarities in the total number of genes contained within closely related species, this similarity hides enormous complexities in the identity and number of constituent proteins. Species can differ in their complement of genes through both gene duplication and loss. Here we investigated the gain and loss of genes from the genomes of 12 fully sequenced <i>Drosophila</i> (fruit flies). We find high rates of gain and loss in all species and estimate that approximately one new gene is gained or lost every 60,000 years. We also find several hundred cases of extremely rapid gene turnover, with dozens of genes gained or lost in only a few million years. The highest turnover in gene number occurs in genes involved in sex and reproduction. Taken together, our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this evolutionary revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species.</p></sec></div> <span property="dc:date" content="2007-11-09" datatype="xsd:date" rel="dc:identifier" href="http://dx.doi.org/10.1371/journal.pgen.0030197"></span> <span property="dc:subject" content="Evolutionary Biology"></span> <form action=""> <input type="hidden" name="journalDisplayName" id="journalDisplayName" value="PLoS Genetics" /> <input type="hidden" name="crossRefPageURL" id="crossRefPageURL" value="/article/crossref/info%3Adoi%2F10.1371%2Fjournal.pgen.0030197" /> <input type="hidden" name="metricsTabURL" id="metricsTabURL" value="/article/metrics/info%3Adoi%2F10.1371%2Fjournal.pgen.0030197" /> <input type="hidden" name="doi" id="doi" value="info:doi/10.1371/journal.pgen.0030197" /> <input type="hidden" name="articleTitleUnformatted" id="articleTitleUnformatted" value="Gene%20Family%20Evolution%20across%2012%20Drosophila%20Genomes%20" /> <input type="hidden" name="articlePubDate" id="articlePubDate" value="1194595200000" /> </form> <div class="horizontalTabs" xpathLocation="noSelect"> <ul id="tabsContainer"> <li id="article" class="active"><a href="/article/info%3Adoi%2F10.1371%2Fjournal.pgen.0030197" class="tab" title="Article">Article</a></li> <li id="metrics"><a href="/article/metrics/info%3Adoi%2F10.1371%2Fjournal.pgen.0030197" class="tab" title="Metrics">Metrics</a></li> <li id="related"><a href="/article/related/info%3Adoi%2F10.1371%2Fjournal.pgen.0030197" class="tab" title="Related Content">Related Content</a></li> <li id="comments"><a href="/article/comments/info%3Adoi%2F10.1371%2Fjournal.pgen.0030197" class="tab" title="Comments">Comments: 0</a></li> </ul> </div> <div id="retractionHtmlId" class="retractionHtmlId" style="display:none;" xpathLocation="noSelect"> <div id="retractionlist"></div> </div> <div id="fch" class="fch" style="display:none;" xpathLocation="noSelect"> <p class="fch"><strong> Formal Correction:</strong> This article has been <em>formally corrected</em> to address the following errors.</p> <ol id="fclist" class="fclist"></ol> </div> <div id="articleMenu" xpathLocation="noSelect"> <div class="wrap"> <ul> <li class="annotation icon">To <strong>add a note</strong>, highlight some text. <a href="#" onclick="toggleAnnotation(this, 'public'); return false;" title="Click to turn notes on/off">Hide notes</a></li> <li class="discuss icon"> <a href="/user/secure/secureRedirect.action?goTo=%2Farticle%2Finfo%3Adoi%2F10.1371%2Fjournal.pgen.0030197">Make a general comment</a> </li> </ul> <div id="sectionNavTopBox" style="display:none;"> <p><strong>Jump to</strong></p> <div id="sectionNavTop" class="tools"></div> </div> </div> </div> <p xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" class="authors" xpathLocation="noSelect"><span property="dc:creator">Matthew W. Hahn</span><sup><a href="#aff1"> 1 </a></sup><sup>,</sup><sup><a href="#aff2">2</a></sup><sup><a href="#cor1" class="fnoteref">*</a></sup>, <span property="dc:creator">Mira V. Han</span><sup><a href="#aff2"> 2 </a></sup>, <span property="dc:creator">Sang-Gook Han</span><sup><a href="#aff2"> 2 </a></sup></p><p xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" class="affiliations" xpathLocation="noSelect"><a name="aff1" id="aff1"></a><strong>1</strong> Department of Biology, Indiana University, Bloomington, Indiana, United States of America, <a name="aff2" id="aff2"></a><strong>2</strong> School of Informatics, Indiana University, Bloomington, Indiana, United States of America</p><div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" class="abstract" xpathLocation="/article[1]/front[1]/article-meta[1]/abstract[1]"><a id="abstract0" name="abstract0" toc="abstract0" title="Abstract"></a><h2 xpathLocation="noSelect">Abstract <a href="#top">Top</a></h2><p xpathLocation="/article[1]/front[1]/article-meta[1]/abstract[1]/p[1]">Comparison of whole genomes has revealed large and frequent changes in the size of gene families. These changes occur because of high rates of both gene gain (via duplication) and loss (via deletion or pseudogenization), as well as the evolution of entirely new genes. Here we use the genomes of 12 fully sequenced <i>Drosophila</i> species to study the gain and loss of genes at unprecedented resolution. We find large numbers of both gains and losses, with over 40% of all gene families differing in size among the <i>Drosophila</i>. Approximately 17 genes are estimated to be duplicated and fixed in a genome every million years, a rate on par with that previously found in both yeast and mammals. We find many instances of extreme expansions or contractions in the size of gene families, including the expansion of several sex- and spermatogenesis-related families in <span class="genus-species">D. melanogaster</span> that also evolve under positive selection at the nucleotide level. Newly evolved gene families in our dataset are associated with a class of testes-expressed genes known to have evolved de novo in a number of cases. Gene family comparisons also allow us to identify a number of annotated <span class="genus-species">D. melanogaster</span> genes that are unlikely to encode functional proteins, as well as to identify dozens of previously unannotated <span class="genus-species">D. melanogaster</span> genes with conserved homologs in the other <i>Drosophila</i>. Taken together, our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this genomic revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species.</p> </div><div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" class="abstract" xpathLocation="/article[1]/front[1]/article-meta[1]/abstract[2]"><a id="abstract1" name="abstract1" toc="abstract1" title="Author Summary"></a> <h2 xpathLocation="noSelect">Author Summary <a href="#top">Top</a></h2> <p xpathLocation="/article[1]/front[1]/article-meta[1]/abstract[2]/sec[1]/p[1]">Though comparative genome sequencing has revealed vast similarities in the total number of genes contained within closely related species, this similarity hides enormous complexities in the identity and number of constituent proteins. Species can differ in their complement of genes through both gene duplication and loss. Here we investigated the gain and loss of genes from the genomes of 12 fully sequenced <i>Drosophila</i> (fruit flies). We find high rates of gain and loss in all species and estimate that approximately one new gene is gained or lost every 60,000 years. We also find several hundred cases of extremely rapid gene turnover, with dozens of genes gained or lost in only a few million years. The highest turnover in gene number occurs in genes involved in sex and reproduction. Taken together, our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this evolutionary revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species.</p> </div> <div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" class="articleinfo" xpathLocation="noSelect"><p><strong>Citation: </strong>Hahn MW, Han MV, Han S-G (2007) Gene Family Evolution across 12 <i>Drosophila</i> Genomes . PLoS Genet 3(11): e197. doi:10.1371/journal.pgen.0030197</p><p><strong>Editor: </strong>Gil McVean, University of Oxford, United Kingdom</p><p></p><p><strong>Received:</strong> May 11, 2007; <strong>Accepted:</strong> September 26, 2007; <strong>Published:</strong> November 9, 2007</p><p><strong>Copyright:</strong> © 2007 Hahn et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</p><p><strong>Funding:</strong> This research was supported by grants from the National Institutes of Health (R01-GM076643A) to S.V. Nuzhdin and MWH, and from the National Science Foundation (DBI-0543586) to MWH.</p><p><strong>Competing interests:</strong> The authors have declared that no competing interests exist.</p><p><strong>Abbreviations: </strong>EST, expressed sequence tag; FRB, fuzzy reciprocal BLAST; GO, Gene Ontology; MRCA, most recent common ancestor; -p, -parameter</p><p><a name="cor1"></a>* To whom correspondence should be addressed. E-mail: <a href="mailto:mwh@indiana.edu">mwh@indiana.edu</a></p></div> <div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" id="section1" xpathLocation="/article[1]/body[1]/sec[1]"><a id="s1" name="s1" toc="s1" title="Introduction"></a><h3 xpathLocation="noSelect">Introduction <a href="#top">Top</a></h3><p xpathLocation="/article[1]/body[1]/sec[1]/p[1]">A major goal of evolutionary genetics is to understand the molecular changes underlying phenotypic variation within and between species. The sequencing of whole genomes has made it possible to study not just individual mutations between orthologous sequences, but large-scale differences in gene complements between species. Such comparative genomic studies have found large disparities among organisms in the number of copies of genes involved in distinct cellular and developmental processes (e.g., [<a href="#pgen-0030197-b001">1</a>,<a href="#pgen-0030197-b002">2</a>]) and have even revealed the loss of entire gene families from individual lineages (e.g., [<a href="#pgen-0030197-b003">3</a>,<a href="#pgen-0030197-b004">4</a>]). Though these studies begin to offer some insight into the molecular basis for phenotypic evolution, the timescales considered are often too long to provide evidence for the role of any single change (but see, e.g., [<a href="#pgen-0030197-b005">5</a>–<a href="#pgen-0030197-b008">8</a>]). The sequencing of the genomes of 12 <i>Drosophila</i> species—whose most recent common ancestor (MRCA) lived only 60 million years ago [<a href="#pgen-0030197-b009">9</a>]—offers the ability to study changes in the genomic complement of genes at an unprecedented resolution.</p> <p xpathLocation="/article[1]/body[1]/sec[1]/p[2]">Changes in the number of genes and proteins devoted to specific biological processes may arise in a number of different ways. First, gene duplication along any lineage will increase the number of genes, resulting in gene families containing multiple copies that are partially or completely overlapping in function. These gene duplicates may subsequently diverge in function by taking on new roles or by dividing up old roles [<a href="#pgen-0030197-b010">10</a>–<a href="#pgen-0030197-b012">12</a>]. There are now numerous examples in <i>Drosophila</i> of individual gene families with duplicates differentiated in both protein sequences (e.g., [<a href="#pgen-0030197-b013">13</a>–<a href="#pgen-0030197-b016">16</a>]) and gene expression domains (e.g., [<a href="#pgen-0030197-b017">17</a>]). A second reason for differences in gene complement among species is that genes may be lost along a lineage when disabling mutations in them are not selected against. Such gene losses can even be directly advantageous [<a href="#pgen-0030197-b018">18</a>], consistent with the so-called “less is more” hypothesis of Olson and colleagues [<a href="#pgen-0030197-b019">19</a>]. Finally, the de novo creation of genes through various processes (e.g., [<a href="#pgen-0030197-b020">20</a>–<a href="#pgen-0030197-b022">22</a>])—while certainly quite rare—may contribute to lineage-specific differences in the number and function of constituent proteins.</p> <p xpathLocation="/article[1]/body[1]/sec[1]/p[3]">To provide a <i>Drosophila</i>-wide perspective on gene family evolution, we applied two different computational methods that estimate the rate and number of gene gains and losses. The first is a likelihood approach that estimates the average rate of gene gain and loss, the number of gains and losses on each branch of a phylogeny, and assigns <i>p-</i>values to large changes [<a href="#pgen-0030197-b023">23</a>]. The second is the nonparametric gene tree/species tree reconciliation approach [<a href="#pgen-0030197-b024">24</a>–<a href="#pgen-0030197-b027">27</a>], which counts the number of gains and losses on each branch of the phylogeny without a specific probability model. While previous estimates of genome-wide rates of duplication in <span class="genus-species">D. melanogaster</span> [<a href="#pgen-0030197-b028">28</a>,<a href="#pgen-0030197-b029">29</a>] have offered a snapshot of one of the major mechanisms contributing to genome evolution, our analyses afford a wider view of this process. We show that genes have been gained and lost in all species at varying rates; that several hundred gene families exhibit significantly large expansions or contractions in number suggestive of adaptive natural selection; and that approximately equal numbers of gene families have either been lost completely in a species or are present only in a subset of the species considered here, information that can be used to improve the annotation of the <span class="genus-species">D. melanogaster</span> genome. Throughout the analyses we examine the effect that heterogeneity in both assembly and annotation quality among the 12 genomes can have on evolutionary inferences.</p> </div> <div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" id="section2" xpathLocation="/article[1]/body[1]/sec[2]"><a id="s2" name="s2" toc="s2" title="Results/Discussion"></a><h3 xpathLocation="noSelect">Results/Discussion <a href="#top">Top</a></h3> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/title[1]">Gene Families in <i>Drosophila</i></h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/p[1]">Using the predicted gene sets from all 12 <i>Drosophila</i> species, fuzzy reciprocal BLAST (FRB) was used to cluster genes into gene families on the basis of protein sequence similarity (Materials and Methods). All 188,868 genes in the dataset are assigned membership to a single family; the gene families are therefore nonoverlapping. Excluding lineage-specific families and likely annotation artifacts (see below), there are 11,434 gene families inferred to have been present in the <i>Drosophila</i> MRCA (“Analysis” in <a href="#pgen-0030197-t001">Table 1</a>). The mean number of genes in each family is 12.97 (i.e., there is slightly more than one copy per species), with the largest family containing 144 copies across all 12 genomes. Although the term “gene family” often only refers to multiple, closely related paralogs within a species, we use the term here to denote groups of related genes that include both paralogs within the same species and orthologs and paralogs from other species. This broader definition makes it possible to study the evolution of gene families across species, as every sensu stricto gene family must have first appeared as a single-copy family [<a href="#pgen-0030197-b023">23</a>].</p> <div class="figure" xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/table-wrap[1]"><a name="pgen-0030197-t001" id="pgen-0030197-t001" title="Click for larger image " href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.t001" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><img xpathLocation="noSelect" border="1" src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.t001&representation=PNG_S" align="left" alt="thumbnail" class="thumbnail"></a><p><strong xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/table-wrap[1]/label[1]"><a href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.t001" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><span xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/table-wrap[1]/label[1]">Table 1. </span></a></strong></p><p xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/table-wrap[1]/caption[1]/p[1]">Number of Genes and Families in Each <i>Drosophila</i> Species</p> <span xpathLocation="noSelect">doi:10.1371/journal.pgen.0030197.t001</span><div class="clearer"></div></div><p xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/p[2]">Of the 11,434 families, 4,693 (41.0%) have changed size in at least one species. There are no Gene Ontology (GO) terms that are over-represented among the families that have changed in size relative to the whole genome. The 4,693 families represent the minimum number that have undergone the gain or loss of genes, as equal numbers of gains and losses along a lineage will not result in a net change in family size. Different definitions of gene families may also affect results, as more stringent similarity thresholds make families smaller on average and less stringent thresholds make families larger [<a href="#pgen-0030197-b008">8</a>]. To study the effect of changing gene family definitions, we reclustered the <i>Drosophila</i> genes by varying the BLAST similarity threshold used by an order of magnitude higher and lower (Materials and Methods). As expected, a more stringent similarity criterion caused there to be more, smaller families overall, but fewer families inferred to have been in the MRCA (8.0% fewer families), while a more lenient criterion caused there to be more families in the MRCA (9.8% more families). Changing the clustering thresholds also slightly changed the proportion of families changing in size in the expected directions—1.9% fewer changed when there were smaller families, while 2.1% more changed with larger families.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/p[3]">Any analysis of gene presence and absence must also consider the quality of the genomic data used to infer gene gains and losses [<a href="#pgen-0030197-b008">8</a>]. There are two main sources of differences in data quality among the <i>Drosophila</i> genomes considered here: heterogeneity in gene prediction (“annotation”) and heterogeneity in genome coverage (“assembly”). We discuss the effect of each of these in turn.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/p[4]">The first <i>Drosophila</i> genome to be sequenced, <span class="genus-species">D. melanogaster</span> [<a href="#pgen-0030197-b030">30</a>], is 99% complete at the sequence level and is in its fifth major annotation release after a number of years of manual curation [<a href="#pgen-0030197-b031">31</a>]. For the purposes of the comparative analyses undertaken by the consortium analyzing the 12 <i>Drosophila</i> genomes [<a href="#pgen-0030197-b032">32</a>,<a href="#pgen-0030197-b033">33</a>], the most recent versions of the genome assembly and gene annotations are taken as the <span class="genus-species">D. melanogaster</span> gene complement (Berkeley <i>Drosophila</i> Genome Project release 5, <a href="http://www.fruitfly.org">http://www.fruitfly.org</a> ). The ab initio gene prediction programs used to find genes in the other <i>Drosophila</i> species were not used as a basis for the final gene set from <span class="genus-species">D. melanogaster</span>. Likewise, similarity-based searches for finding genes in the other <i>Drosophila</i> species utilized already predicted genes from <span class="genus-species">D. melanogaster</span>, but not vice versa (but see [<a href="#pgen-0030197-b033">33</a>] for an additional list of newly annotated <span class="genus-species">D. melanogaster</span> genes not included in release 5). The result of this heterogeneity in gene annotation is consistent with the known high false-positive rate of ab initio predictors: <span class="genus-species">D. melanogaster</span> is predicted to have the fewest genes of any genome by far (<a href="#pgen-0030197-t001">Table 1</a>). Many more of the genes in the other 11 species are also found in gene families by themselves and are called annotation artifacts in our analyses (<a href="#pgen-0030197-t001">Table 1</a>). In fact, there is a significant correlation between the total gene count from each genome and the number of single-gene, single-species families (<i>r</i> = 0.62, <i>p</i> = 0.033). Removing the thousands of genes without significant similarity to any others brings the predicted gene numbers among species much closer to one another. Importantly, the overprediction due to ab initio gene-finding software does not affect our main analyses as we eliminate such annotation artifacts from the dataset considered.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[1]/p[5]">While ab initio gene prediction has a unidirectional effect on gene number (i.e., more genes), low-quality genome assemblies can lead to both the addition and subtraction of genes. Genes may be missing simply because there are large holes in the assembled genome, while genes can be added if allelic diversity within the sequenced strain is wrongly assembled as duplicated loci (e.g., [<a href="#pgen-0030197-b034">34</a>]). The majority of <i>Drosophila</i> genomes were sequenced to greater than 8× coverage (i.e., the number of nucleotides sequenced is equal to eight times the total genome length), though the <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span> genomes were only done to 4×, as their close relationships to high-coverage genomes was thought to mitigate the need for deeper sequencing. In addition, the <span class="genus-species">D. simulans</span> genome assembly is a “mosaic” assembly of low-coverage sequencing of six inbred lines of this species [<a href="#pgen-0030197-b035">35</a>]. As might be expected from the lower quality sequence assemblies that result from lower sequence coverage, both <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span> are predicted to have a high number of annotation artifacts (1,991 and 2,718 genes, respectively). <span class="genus-species">D. sechellia</span>, which is only ~5 million years diverged from <span class="genus-species">D. melanogaster</span>, is initially predicted to have 2,483 more genes than this well-annotated genome; we do not believe that there is any evidence outside the ab initio gene prediction programs for this massive increase in proteomic complexity. Furthermore, many of the genes initially identified as pseudogenes in the <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. simulans</span> genome have subsequently been found to be sequencing errors ([<a href="#pgen-0030197-b036">36</a>]; C. Jones, personal correspondence). Because errors in both genome assembly and gene annotation will lead to errors in the number of inferred gains and losses, we have repeated many of the analyses that follow excluding <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span>.</p> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/title[1]">Estimating Gene Gain and Loss via Maximum Likelihood</h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[1]">Our likelihood approach estimates the average rate of gene turnover across the <i>Drosophila</i>, λ, to be 0.0012 gains and losses/gene/million years; this is the rate at which the size of a gene family is expected to either expand or contract over time because of gene gain or loss (see <a href="#s3">Materials and Methods</a> and [<a href="#pgen-0030197-b023">23</a>]). Varying the definition of gene families resulted in a change in rate of only ~2%. In comparison, Lynch and Conery [<a href="#pgen-0030197-b028">28</a>] estimated the rate of gene gain in <span class="genus-species">D. melanogaster</span> via an independent method as 0.0023 duplications/gene/million years, an estimate consistent with the one presented here. Our rate is also similar to the rate of gene gain and loss estimated from both yeast (λ = 0.0020; [<a href="#pgen-0030197-b023">23</a>]) and mammals (λ = 0.0016; [<a href="#pgen-0030197-b008">8</a>]) using the same likelihood method. These data therefore suggest that there is a remarkably similar rate of gene duplication and loss across eukaryotes, suggesting common molecular mechanisms among species. The estimated rate of gene duplication and loss in <i>Drosophila</i> implies that within a single genome, there are approximately 17 new duplicates and 17 new losses fixed every million years (0.0012 gains and losses/gene/million years × 14,000 genes). A study of duplicate genes formed by retrotransposition in <i>Drosophila</i> found a much lower rate: only 0.51 new duplicates per million years [<a href="#pgen-0030197-b037">37</a>]. These data appear to indicate that the rate of functional gene duplication via unequal crossing-over and transposition is higher than that via retrotransposition.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[2]">Estimating only the average rate of change across the phylogeny will mask any heterogeneity in evolutionary rates among species (e.g., [<a href="#pgen-0030197-b038">38</a>]). We therefore attempted to estimate a fully parameterized model with 22 different values of λ, one for each branch of the tree, with an updated version of the program CAFE [<a href="#pgen-0030197-b039">39</a>]. Though the likelihoods of estimated 22-parameter (22-p) models were consistently higher than that of the 1-p model, the results did not converge to a single global maximum (unpublished data). It is likely that the search space is simply too large to find such a maximum with 22 parameters. Instead, we created a 3-p model by assigning branches to one of three rate categories—fast (λ<sub> 1</sub>), medium (λ<sub> 2</sub>), and slow (λ<sub> 3</sub>)—on the basis of the best branch-specific rate estimates from the 22-p model. This model always converged to a single maximum (λ<sub> 1</sub> = 0.0193, λ<sub> 2</sub> = 0.0022, and λ<sub> 3</sub> = 0.0006) and fit the data significantly better than the 1-p model (−2ΔL = 15,156; <i>p</i> < 1.0 × 10<sup>−16</sup>; df = 2; <a href="#pgen-0030197-g001">Figure 1</a>). Although more parameter-rich models can be constructed, the distribution of rates estimated in the 22-p model suggested a natural division into three parameter classes; we also did not find that finer divisions offered any more biological insight than a 3-p model. The “fast” branches of the 3-p tree include the terminal lineages leading to <span class="genus-species">D. simulans</span>, <span class="genus-species">D. sechellia</span>, <span class="genus-species">D. pseudoobscura</span>, and <span class="genus-species">D. persimilis</span>. The “slow” branches include the terminal lineages leading to <span class="genus-species">D. virilis</span>, <span class="genus-species">D. mojavensis</span>, <span class="genus-species">D. willistoni</span>, and <span class="genus-species">D. ananassae</span>. Different definitions of gene families always significantly favored the 3-p model over the 1-p model.</p> <div class="figure" xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/fig[1]"><a name="pgen-0030197-g001" id="pgen-0030197-g001" title="Click for larger image " href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.g001" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><img xpathLocation="noSelect" border="1" src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.g001&representation=PNG_S" align="left" alt="thumbnail" class="thumbnail"></a><p><strong xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/fig[1]/label[1]"><a href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.g001" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><span xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/fig[1]/label[1]">Figure 1. </span></a> <span xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/fig[1]/caption[1]/title[1]">Gene Family Evolution in <i>Drosophila</i></span></strong></p><p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/fig[1]/caption[1]/p[1]">On each branch of the tree the number of gene gains/losses is given. The colors of the numbers denote the estimated rate of gene gain and loss. Numbers in boxes are identifiers for internal branches of the phylogeny.</p> <span xpathLocation="noSelect">doi:10.1371/journal.pgen.0030197.g001</span><div class="clearer"></div></div><p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[3]">It is important to note that the four rapidly evolving lineages are all either low-coverage genomes or are sister to low-coverage genomes (<span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span>); this is likely to contribute to the apparent rate increases. To ask whether the inclusion of these species has had a large effect on our inferences, we reestimated a 1-p model without <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span>. As expected, the estimated average rate of gene gain and loss was lower without these two species, at λ = 0.0010 (compared to λ = 0.0012).</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[4]">To ask whether the low-quality assemblies and annotations in these species have an effect on the number of gains and losses in closely related taxa, we compared two further models. In the first we estimated one rate for the <span class="genus-species">D. melanogaster</span> lineage (λ<sub> mel</sub>) and one for all other branches (λ<sub> background</sub>), including data from <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span>. In the second model we estimated the same parameters but excluded the <span class="genus-species">D. sechellia</span> and <span class="genus-species">D. persimilis</span> data. This analysis reveals little difference in the estimated rate in <span class="genus-species">D. melanogaster</span>. Including the two questionable genomes gives λ<sub> mel</sub> = 0.0054 and λ<sub> background</sub> = 0.0011; excluding these two species gives λ<sub> mel</sub> = 0.0050 and λ<sub> background</sub> = 0.0010. These analyses demonstrate that the rate of gene turnover inferred in <span class="genus-species">D. melanogaster</span> is likely not an artifact of its relationship to <span class="genus-species">D. sechellia</span>, though the reduced dataset still includes the mosaic assembly of <span class="genus-species">D. simulans</span>. We therefore conclude that while poor annotation and assembly can have insidious effects on the inferred rate of gene gain and loss in affected genomes, these consequences should not reach far beyond the implicated lineages.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[5]">One further pattern revealed in the heterogeneous rates of gene gain and loss across lineages is the apparent relationship between branch length and rate. Though our previous analyses suggest that the high rates on the very short <span class="genus-species">D. sechellia</span>, <span class="genus-species">D. simulans</span>, <span class="genus-species">D. persimilis</span>, and <span class="genus-species">D. pseudoobscura</span> lineages are likely due to problems of annotation, many of the “medium” rate branches are also short in length (<a href="#pgen-0030197-g001">Figure 1</a>). To ensure that the higher rates estimated on shorter branches of the tree are not due to a methodological artifact of our likelihood method, we simulated 1,000 datasets across the <i>Drosophila</i> tree under a 1-p model and then estimated rates of change under the same 3-p model as above (Materials and Methods). The average ratio of λ<sub>1</sub>/λ<sub>3</sub> in these simulations was 1.00 and the maximum was 1.25, compared to the observed value of λ<sub>1</sub>/λ<sub>3</sub> = 32.2. Also as expected if the likelihood ratio tests are χ<sup>2</sup>-distributed with 2 df, 5.7% of the simulated datasets had −2ΔL > 5.99 (i.e., <i>p</i> < 0.05). These simulations imply that the observed likelihood ratio (−2ΔL = 15,156) is highly significant (<i>p</i> << 0.001). Together, our results strongly suggest that the observed rate heterogeneity in the data is not due to a methodological problem.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[6]">Though the apparent negative correlation between rate of gene turnover and branch length is not due to an artifact, it is worthwhile to consider biological explanations for this relationship beyond the effects of genome annotation. Many of the shortest branches in the <i>Drosophila</i> phylogeny are also those closest to the tips of the tree. Because all comparative genomic studies—whether of nucleotide substitutions or gene gains and losses—use only a single genome from each species, estimates of divergence by necessity also include the polymorphisms present in the individual chosen for sequencing (even when this individual is highly inbred). If many segregating polymorphisms are slightly deleterious, then estimates of rates on tip branches may be higher than for deeper branches [<a href="#pgen-0030197-b040">40</a>], though population sizes must be extremely large for this explanation to hold [<a href="#pgen-0030197-b041">41</a>]. As studies of both humans (e.g., [<a href="#pgen-0030197-b042">42</a>]) and <i>Drosophila</i> (J. J. Emerson and M. Cardoso-Moreira, personal correspondence) have uncovered a high number of polymorphic duplications and deletions of genes in natural populations, it is possible that these polymorphisms play a role in the higher rates of change seen in more recent lineages.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[7]">By estimating the maximum likelihood value for the size of gene families at internal nodes of the phylogenetic tree, we can infer the minimum number of gene gains and losses along each branch by comparing parent and daughter nodes ([<a href="#pgen-0030197-b008">8</a>]). Doing this comparison for each branch of the <i>Drosophila</i> tree and summing across families allows us to estimate the total number of genes gained and lost along every lineage (<a href="#pgen-0030197-g001">Figure 1</a>). Gains and losses of genes have occurred on all but one branch of the <i>Drosophila</i> tree (branch 3), and each terminal lineage leading to an extant species includes hundreds of gains and losses.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[8]">On the terminal lineage leading to <span class="genus-species">D. melanogaster</span>, we infer the gain of 94 genes and the loss of 505 genes in the ~5 million years since the split with the simulans/sechellia clade. Running our analyses using alternative tree topologies [<a href="#pgen-0030197-b043">43</a>] produced very similar results (unpublished data). The most common GO terms associated with gene families that have expanded in <span class="genus-species">D. melanogaster</span> are: proteolysis, defense response, cytoskeleton, extracellular transport, response to toxin, and trypsin activity. The most common GO terms associated with contracting gene families are regulation of transcription, protein binding, transcription factor activity, zinc ion binding, nucleus DNA binding, and mesoderm development. There are no significantly over-represented terms among these families.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[2]/p[9]">The observed “revolving door” of gene gain and loss [<a href="#pgen-0030197-b008">8</a>] has important implications for divergence among <i>Drosophila</i> species. For instance, even though the average synonymous site distance between <span class="genus-species">D. simulans</span> and <span class="genus-species">D. melanogaster</span> is 0.117 [<a href="#pgen-0030197-b035">35</a>], <span class="genus-species">D. melanogaster</span> also has 856 genes that are not found in <span class="genus-species">D. simulans</span> (94 gains in <span class="genus-species">D. melanogaster</span> + 762 losses in <span class="genus-species">D. simulans</span>), and <span class="genus-species">D. simulans</span> has 800 genes not found in <span class="genus-species">D. melanogaster</span> (295 gains in <span class="genus-species">D. simulans</span> + 505 losses in <span class="genus-species">D. melanogaster</span>). This amounts to 5.9% divergence (856 + 800/2 × 14,000 genes) at the level of whole genes. These results imply that both changes in homologous nucleotides and the gain and loss of genetic material may be important in the differentiation of these two species (e.g., [<a href="#pgen-0030197-b044">44</a>]).</p> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/title[1]">Estimating Gene Gain and Loss via Gene Tree/Species Tree Reconciliation</h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/p[1]">An alternative method for inferring the history of gene gain and loss among genomes is to reconcile the species tree with the gene tree of each family [<a href="#pgen-0030197-b024">24</a>–<a href="#pgen-0030197-b027">27</a>]. As this method does not assume a particular probability model for gains and losses, it is a valuable independent approach to estimating gene gains and losses. Tree reconciliation has frequently been used to infer gains and losses in individual families (e.g., [<a href="#pgen-0030197-b045">45</a>]), but has been used less often to infer whole genome patterns of gene turnover (e.g., [<a href="#pgen-0030197-b038">38</a>,<a href="#pgen-0030197-b046">46</a>]). We built 11,390 gene trees from the 11,434 families using protein distances and the neighbor-joining algorithm [<a href="#pgen-0030197-b047">47</a>]. We did not build trees for families with greater than 250 copies in total. We reconciled the 11,390 gene trees with the <i>Drosophila</i> species tree (as well as the two alternative species tree topologies) to map gene gains and losses to individual branches of the phylogeny (<a href="#pgen-0030197-sg001">Figure S1</a>). As a way of checking for consistency between the likelihood and gene/species tree approaches, we compared the number of inferred gene gains on informative branches from each (see <a href="#s3">Materials and Methods</a> and [<a href="#pgen-0030197-b038">38</a>]). The number of losses inferred by tree reconciliation methods can be highly biased because incorrect gene tree topologies will always add additional loss events towards the tips of the species tree [<a href="#pgen-0030197-b038">38</a>], and therefore we do not use these estimates here. The correlation between the two methods was high (<i>r</i> = 0.90, <i>p</i> < 0.00001; <a href="#pgen-0030197-g002">Figure 2</a>), indicating that our estimates of the number of gene duplications along each lineage are likely to be quite accurate. We inferred the gain of 89 genes in <span class="genus-species">D. melanogaster</span> since its split with simulans/sechellia using the tree reconciliation approach, compared to the estimate of 94 genes using the likelihood method.</p> <div class="figure" xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/fig[1]"><a name="pgen-0030197-g002" id="pgen-0030197-g002" title="Click for larger image " href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.g002" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><img xpathLocation="noSelect" border="1" src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.g002&representation=PNG_S" align="left" alt="thumbnail" class="thumbnail"></a><p><strong xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/fig[1]/label[1]"><a href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.g002" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><span xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/fig[1]/label[1]">Figure 2. </span></a> <span xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/fig[1]/caption[1]/title[1]">Correlation between the Number of Gene Gains on Informative Branches of the Phylogeny Inferred from the Likelihood Method and from the Tree Reconciliation Method</span></strong></p><span xpathLocation="noSelect">doi:10.1371/journal.pgen.0030197.g002</span><div class="clearer"></div></div><p xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/p[2]">The comparison between the tree reconciliation and likelihood methods also allows us to make some tentative conclusions regarding the frequency of gene conversion among <i>Drosophila</i> gene duplicates. Because gene conversion between duplicated genes will cause them to be highly similar, gene trees built from such genes will tend to show many more recent duplications. Even when there has been no change in the number of genes in a particular family, gene conversion will cause tree reconciliation methods to infer multiple, parallel duplications across lineages. This implies that rampant gene conversion will cause reconciliation methods to estimate many more duplications than our likelihood method, which is based only on the size of gene families. However, this is not seen (<a href="#pgen-0030197-g002">Figure 2</a>): in fact, the ratio of genes estimated via reconciliation to that estimated via likelihood is 1.01, and more genes are estimated via reconciliation on only three of the 12 tip branches. Though these data certainly cannot rule out a role for gene conversion in individual families, they strongly suggest that it is at most a minor role genome-wide.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[3]/p[3]">As a further check on the number of duplicates specific to <span class="genus-species">D. melanogaster</span> inferred from the 11,390 trees, we calculated synonymous site distances between all candidate pairs of duplicates in this species. If <i>d</i><sub>S</sub> = 0.117 is the average synonymous distance between <span class="genus-species">D. melanogaster</span> and <span class="genus-species">D. simulans</span> [<a href="#pgen-0030197-b035">35</a>], then melanogaster-specific duplicates should be more similar than this. There are two explanations for why pairs of duplicates with greater divergence than expected (i.e., <i>d</i><sub>S</sub> > 0.117) can be inferred to be melanogaster specific using the tree reconciliation method. They may in fact be melanogaster specific but are evolving more rapidly at the nucleotide level than the average pair of orthologs; or the duplication event may pre-date the melanogaster-simulans split, but both <span class="genus-species">D. simulans</span> paralogs have been lost. As it is difficult to distinguish between these two possibilities, we have chosen to be conservative and to only count those pairs with <i>d</i><sub>S</sub> < 0.117. Of the 89 genes initially considered to be melanogaster-specific duplicates by tree reconciliation, 77 of them followed this rule. These should be considered a minimum estimate for the number of duplications unique to the <span class="genus-species">D. melanogaster</span> genome from these gene families.</p> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/title[1]">Accelerated Evolution of Gene Families</h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/p[1]">The likelihood approach to studying gene family evolution allows us to identify individual gene families that are evolving at rates of gain and loss significantly higher than the genome-wide average [<a href="#pgen-0030197-b023">23</a>]. Such families can exhibit either larger-than-expected expansions or contractions, which may be confined to either a single lineage of the phylogeny or may reflect large changes across the tree. Of the 11,434 gene families inferred to have been present in the <i>Drosophila</i> MRCA, 342 exhibit significant expansions or contractions (<i>p</i> < 0.0001; <a href="#pgen-0030197-st001">Table S1</a>). At this significance level, only slightly more than one family is expected by chance. We are especially interested in families with large, lineage-specific expansions, as it is likely that adaptive natural selection acts on lineage-specific traits through these changes [<a href="#pgen-0030197-b008">8</a>,<a href="#pgen-0030197-b048">48</a>,<a href="#pgen-0030197-b049">49</a>].</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/p[2]">Rapidly evolving families are associated with many biological processes, but the most common GO terms found among them are defense response, proteolysis, trypsin activity, protein binding, and zinc ion binding. Only one term—response to chemical stimulus (GO:0042221)—was significantly over-represented. Interestingly, many families in these categories have previously been identified as having large differences in copy number between both <span class="genus-species">D. melanogaster</span> and the mosquito, <span class="genus-species">Anopheles gambiae</span> [<a href="#pgen-0030197-b050">50</a>], as well as between <span class="genus-species">D. melanogaster</span> and the nematode, <span class="genus-species">Caenorhabditis elegans</span> [<a href="#pgen-0030197-b002">2</a>]. Our results demonstrate that there is significant variation in copy number even among closely related <i>Drosophila</i> species. It is also important to point out that genes involved in many of these processes (defense response, proteolysis, and trypsin activity) evolve rapidly at the protein level as well [<a href="#pgen-0030197-b032">32</a>]. The parallel evolution of these proteins in sequence and copy number suggests that natural selection may act on multiple types of molecular changes to affect similar adaptive outcomes.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/p[3]">Of the 342 rapidly evolving families, we were able to identify 22 that showed large changes in copy number on the terminal branch leading to <span class="genus-species">D. melanogaster</span> (<a href="#pgen-0030197-st002">Table S2</a>). Significant contractions occurred in 18 of the families and significant expansions in the remaining four (Dfam250, Dfam1703, Dfam2187, and Dfam6175). A total of four of the contracting families are made up of zinc-finger proteins, and all of the contractions in these four families result in complete loss of the family (i.e., there are no copies in the <span class="genus-species">D. melanogaster</span> genome). Family Dfam2548 has gone from five copies to one copy; the one remaining gene in <span class="genus-species">D. melanogaster</span> is longitudinals lacking (<i>lola</i>) and is involved in axon growth and guidance [<a href="#pgen-0030197-b051">51</a>]. Another family to show a significant contraction (Dfam3206) was reduced from four copies to one copy (<i>pipe</i>) in <span class="genus-species">D. melanogaster</span> and is reported to be involved in embryonic pattern formation. There are many additional families that have been lost from <span class="genus-species">D. melanogaster</span> (see Loss of Entire Gene Families, below), but none show such dramatic reductions in number in the last five million years.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/p[4]">The four families with significant expansions have varying biological functions, though all may be involved in reproduction: one contains analogs of the protein kinase CK2 complex (Dfam2187), one is the Sdic (sperm-specific dynein intermediate chain) gene family (Dfam6175), and two are proteolysis/trypsin families (Dfam250 and Dfam1703). (The Dfam database, containing descriptions of the families, alignments, gene trees, and links to FlyBase can be found at <a href="http://www.bio.indiana.edu/~hahnlab/Databases.html">http://www.bio.indiana.edu/~hahnlab/Data​bases.html </a>.) The family annotated as protein kinases has expanded in number from four to 14 in <span class="genus-species">D. melanogaster</span>. This family contains the gene Stellate (<i>Ste</i>), which is involved in male fertility and meiotic drive [<a href="#pgen-0030197-b052">52</a>,<a href="#pgen-0030197-b053">53</a>] and is arranged in tandem repeats on the X chromosome in <span class="genus-species">D. melanogaster</span> [<a href="#pgen-0030197-b054">54</a>]. It was previously thought to have been absent from other species in the melanogaster group of <i>Drosophila</i> [<a href="#pgen-0030197-b054">54</a>], though we find homologs in all 12 <i>Drosophila</i> genomes considered here. New gene duplicates in the Sdic gene family were previously reported to have been fixed by adaptive natural selection [<a href="#pgen-0030197-b055">55</a>,<a href="#pgen-0030197-b056">56</a>]. This family is made up largely of duplicated genes that originated as a chimeric fusion between the <i>Cdic</i> and <i>AnnX</i> genes, and that are newly expressed in the testes of male <span class="genus-species">D. melanogaster</span> [<a href="#pgen-0030197-b055">55</a>,<a href="#pgen-0030197-b057">57</a>]. Here we find that this family has expanded from two copies (including the progenitor <i>Cdic</i> genes) to five copies in <span class="genus-species">D. melanogaster</span>.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/p[5]">The two other families that show rapid expansions in <span class="genus-species">D. melanogaster</span> also have reproduction-related functions. Both families of proteolysis/trypsin genes have gained two gene duplicates; Dfam250 has gone from five to seven copies and Dfam1703 from seven to nine copies. Dfam250 shows some evidence for positive selection on the melanogaster-specific protein sequences (<i>p</i> = 0.05), while Dfam1703 does not. As discussed earlier, proteins with trypsin activity are often found to evolve via adaptive natural selection; it is likely that this high rate of sequence evolution is due to their role in male–female sexual antagonism [<a href="#pgen-0030197-b058">58</a>]. Consistent with our observation of rapid evolution in this family in both copy number and protein sequence, we found another family containing trypsin genes that had a significant expansion along lineages leading to <span class="genus-species">D. melanogaster</span>. Dfam239 experienced an expansion from 20 to 28 copies along the branch leading to the melanogaster group (branch 6; <a href="#pgen-0030197-g001">Figure 1</a>) and a second large expansion from 28 to 46 on the branch leading to the melanogaster subgroup (branch 8; there are 46 members of this family found in the <span class="genus-species">D. melanogaster</span> genome). We also found strong evidence for positive selection on the protein sequences of this family (p < 0.001).</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[4]/p[6]">The coincidence of positive selection on protein sequences with expansion of gene number in the above families led us to investigate this relationship further. We analyzed all 49 families that contained <span class="genus-species">D. melanogaster</span>-specific duplications for evidence of positive selection (these families contain the 77 new gene duplicates). Again comparing nonsynonymous to synonymous distances among the paralogs, we found that models including positively selected sites (M2a in PAML) were significantly favored over models without positive selection (M1a) in ten families (20.4%; <i>p</i> < 0.05, df = 2). Of these, six were significant after Bonferroni correction (<i>p</i> < 0.001). Friedman and Hughes [<a href="#pgen-0030197-b059">59</a>] found a similarly high fraction of positively selected duplicates in a comparison of human and mouse, but interpreted their result as a bias in the likelihood method. They further proposed that this bias becomes worse as divergence times grow between sequences. As a comparison, therefore, we examined the frequency of positive selection found among single-copy orthologs in <i>Drosophila</i> using the same methods [<a href="#pgen-0030197-b032">32</a>]. As expected, only 309 (3.6%) of 8,510 sets of orthologs showed evidence for positive selection. As the orthologs have much deeper divergence times than the melanogaster-specific duplicates, we believe that our results uncover a real biological pattern and are not the result of biased methods. However, despite the fact that we have found little evidence for gene conversion among duplicates, if present it may cause false rejection of the null hypothesis [<a href="#pgen-0030197-b060">60</a>]. The high fraction of positively selected duplicates observed in <span class="genus-species">D. melanogaster</span> is consistent with genome-wide comparisons in rhesus macaque [<a href="#pgen-0030197-b049">49</a>] and a number of individual studies from <i>Drosophila</i> (e.g., [<a href="#pgen-0030197-b015">15</a>,<a href="#pgen-0030197-b021">21</a>,<a href="#pgen-0030197-b061">61</a>]). Whether this selection acts initially to fix duplicates or acts after fixation on unconstrained protein sequences is unknown; either way, it suggests that adaptive protein evolution is a frequent feature of duplicate gene evolution [<a href="#pgen-0030197-b010">10</a>].</p> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/title[1]">Loss of Entire Gene Families</h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/p[1]">Gene loss occurs in almost every family that changes in size. Sometimes this results in complete loss of a family: 2,220 of the 11,434 families inferred to have been present in the <i>Drosophila</i> MRCA have had such an extinction event along at least one lineage. The remaining 9,214 families are present in all 12 <i>Drosophila</i> genomes and should be considered the “core” proteome of these species. In total, we infer a minimum of 4,399 contractions that result in the complete loss of a family (multiple extinctions can occur within a single family along distinct lineages), occurring on every branch of the phylogeny (<a href="#pgen-0030197-g003">Figure 3</a>). This number represents a rate of 12 extinctions per million years (=4,399 extinctions/367 million years total in the tree). Varying the similarity threshold used to define gene families did affect the number of extinctions, but order-of-magnitude changes in this threshold only changed the number of extinctions 6%–7% in either direction.</p> <div class="figure" xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/fig[1]"><a name="pgen-0030197-g003" id="pgen-0030197-g003" title="Click for larger image " href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.g003" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><img xpathLocation="noSelect" border="1" src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.g003&representation=PNG_S" align="left" alt="thumbnail" class="thumbnail"></a><p><strong xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/fig[1]/label[1]"><a href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.g003" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><span xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/fig[1]/label[1]">Figure 3. </span></a> <span xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/fig[1]/caption[1]/title[1]">Lineage-Specific and Extinct Gene Families</span></strong></p><p xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/fig[1]/caption[1]/p[1]">On each branch the number of lineage-specific families/extinct families are given. Numbers in boxes are identifiers for internal branches of the phylogeny.</p> <span xpathLocation="noSelect">doi:10.1371/journal.pgen.0030197.g003</span><div class="clearer"></div></div><p xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/p[2]">The <span class="genus-species">D. melanogaster</span> genome has lost 668 entire gene families that are present at the root of the <i>Drosophila</i> tree; 357 of these families have been lost from only the <span class="genus-species">D. melanogaster</span> genome (<a href="#pgen-0030197-g003">Figure 3</a>). Families that are lost from the <span class="genus-species">D. melanogaster</span> genome have many of the same functions as those that are lost from other species. The most common GO categories among extinctions across the <i>Drosophila</i> include zinc ion binding, proteolysis, protein binding, and transcription factor activity. None of these are significantly over-represented.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/p[3]">The loss of entire gene families has been previously observed in many taxa (e.g., [<a href="#pgen-0030197-b004">4</a>,<a href="#pgen-0030197-b005">5</a>,<a href="#pgen-0030197-b008">8</a>]). Results from these studies indicate that while the apparent loss of whole gene families can result from the true loss of all functional genes, there are multiple alternative explanations, including being an artifact of the threshold used for clustering [<a href="#pgen-0030197-b004">4</a>,<a href="#pgen-0030197-b008">8</a>], or missed annotations of genes present in completed genomes. For the families that appear to be extinct in <span class="genus-species">D. melanogaster</span>, we attempted to distinguish among true extinctions, clustering artifacts, and possible missed annotations.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[5]/p[4]">Of the 357 families that appear to have gone extinct along the <span class="genus-species">D. melanogaster</span> branch, 292 have a homologous gene present in <span class="genus-species">D. simulans</span>. We used TBLASTN to search the <span class="genus-species">D. melanogaster</span> genome for sequences with high similarity to these <span class="genus-species">D. simulans</span> genes, and further asked whether matching sequences were syntenic with the <span class="genus-species">D. simulans</span> genes. If matching <span class="genus-species">D. melanogaster</span> sequences were not previously annotated as genes, we used GeneWise [<a href="#pgen-0030197-b062">62</a>] to predict gene models (see <a href="#pgen-0030197-sg002">Figure S2</a> for a summary of results). Though there are many ambiguous cases, we found four extinctions (1.4% of all extinctions) that appear to be artifacts of the clustering algorithm: previously predicted <span class="genus-species">D. melanogaster</span> genes that were syntenic with the <span class="genus-species">D. simulans</span> query sequence and that were members of families with more <span class="genus-species">D. melanogaster</span> than <span class="genus-species">D. simulans</span> genes (such that additional extinctions did not have to be introduced by shifting genes between families). One of these <span class="genus-species">D. melanogaster</span> genes (CG6908) is evolving at ~3.5 times the average nonsynonymous rate and may therefore represent an “extinction” of function without loss of a physical gene. Of the 292 extinctions, we were further able to predict 98 previously unannotated genes in <span class="genus-species">D. melanogaster</span> that had both good matches to predicted genes from <span class="genus-species">D. simulans</span> as well expressed sequence tag (EST) or other expression evidence (<a href="#pgen-0030197-st003">Table S3</a>). Of these, 62 match novel gene predictions using other methods [<a href="#pgen-0030197-b033">33</a>], and 17 match third-party annotations in National Center for Biotechnology Information (NCBI) that were not included in FlyBase (<a href="#pgen-0030197-sg002">Figure S2</a>; <a href="#pgen-0030197-st003">Table S3</a>) [<a href="#pgen-0030197-b063">63</a>]. The majority of previously unidentified genes reside in the 5′ UTRs of annotated genes and are therefore likely to be missed by ab initio gene prediction programs. Our results suggest that while there may be many true losses of entire gene families, taking advantage of comparative genomic data may help to uncover many previously unannotated genes. And though these data indicate that we have overestimated the number of extinctions because of missed annotations, this problem may be largely confined to the <span class="genus-species">D. melanogaster</span> genome, where ab initio gene predictors were not used.</p> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/title[1]">Lineage-Specific Gene Families</h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/p[1]">When the MRCA of the <i>Drosophila</i> is not inferred to have contained any members in a gene family, we conclude that the family evolved subsequent to the MRCA of the species considered. Only species descended from the ancestor in which the family evolved would then have any gene copies. Such lineage-specific families (also called “orphans” [<a href="#pgen-0030197-b064">64</a>–<a href="#pgen-0030197-b066">66</a>]) may arise for a number of reasons: (1) the de novo evolution of new genes [<a href="#pgen-0030197-b067">67</a>]; (2) rapid protein evolution in previously existing genes so that they are no longer identified as being part of a pre-existing family [<a href="#pgen-0030197-b008">8</a>,<a href="#pgen-0030197-b065">65</a>,<a href="#pgen-0030197-b066">66</a>]; (3) artifacts of the clustering process [<a href="#pgen-0030197-b008">8</a>,<a href="#pgen-0030197-b064">64</a>]; (4) horizontal gene transfer [<a href="#pgen-0030197-b068">68</a>]; (5) extinctions on a majority of lineages considered [<a href="#pgen-0030197-b008">8</a>]; or (6) incorrect annotations of sequenced genomes [<a href="#pgen-0030197-b065">65</a>].</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/p[2]">We considered families to be lineage specific if they were not found in at least one species of both the <i>Sophophora</i> and <i>Drosophila</i> subgenera and were also present in at least two copies (see <a href="#s3">Materials and Methods</a>). These criteria result in 4,129 families that we considered to be lineage specific, implying the creation of 11 new gene families per million years (=4,129 lineage-specific families/367 million years total in the tree). These families have evolved on every branch of the tree and in every species (“Lineage Specific” in <a href="#pgen-0030197-t001">Table 1</a> and <a href="#pgen-0030197-g003">Figure 3</a>). As expected [<a href="#pgen-0030197-b008">8</a>], varying the similarity threshold used to define gene families also changed the apparent number of lineage-specific families: a more stringent threshold led to 1.4% more lineage-specific families, while a less stringent threshold led to 1.9% fewer.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/p[3]">Of the 493 lineage-specific families in the subgenus <i>Drosophila</i>, 226 are found in all three species. Of the 3,636 lineage-specific families in the subgenus <i>Sophophora</i>, 288 are found in all nine species. The large difference in the number of families unique to each subgenus is likely due to the unequal sampling of species: extinctions on the relatively longer branch leading to the subgenus <i>Drosophila</i> species, for instance, will result in many families that appear to be specific to the <i>Sophophora</i>. Similarly, the way in which we define lineage-specific families relative to annotation artifacts—that they must be present in multiple copies—likely leads to a large number of lineage-specific families apparently originating on the lineages leading to <span class="genus-species">D. pseudoobscura</span>/<span class="genus-species">D. persimilis</span> and <span class="genus-species">D. simulans</span>/<span class="genus-species">D. sechellia</span>: close relationships between these sister species mean that even spurious gene predictions will have highly similar homologs.</p> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/p[4]">We found three families with multiple gene copies that are unique to <span class="genus-species">D. melanogaster</span> (Dfam12771, Dfam14517, and Dfam15564). The largest of these families has five members (Dfam12771), but no known annotation in FlyBase or via a search of the Pfam database [<a href="#pgen-0030197-b069">69</a>]. Pfam annotations of the other <span class="genus-species">D. melanogaster</span>-specific families reveal proteins involved in puparial adhesion and exocytosis. Over-represented GO terms associated with lineage-specific families in all species include trypsin activity, proteolysis, and postmating behavior (<a href="#pgen-0030197-sg003">Figure S3</a>; <a href="#pgen-0030197-t002">Table 2</a>). These terms are noteworthy, as previous work has uncovered evidence for the evolution of truly de novo proteins with the same functions (e.g., [<a href="#pgen-0030197-b022">22</a>]), though they are also a rapidly evolving group of proteins at the nucleotide level. Many of these de novo genes are expressed in the accessory glands of male <i>Drosophila</i> and are likely to have arisen from previously noncoding DNA [<a href="#pgen-0030197-b022">22</a>]. Supporting this result, we find that our lineage-specific families contain proteins that are on average 50% shorter than the majority of <i>Drosophila</i> proteins (277 versus 551 amino acids; <i>p</i> = 2.6 × 10<sup>−59</sup>).</p> <div class="figure" xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/table-wrap[1]"><a name="pgen-0030197-t002" id="pgen-0030197-t002" title="Click for larger image " href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.t002" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><img xpathLocation="noSelect" border="1" src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.t002&representation=PNG_S" align="left" alt="thumbnail" class="thumbnail"></a><p><strong xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/table-wrap[1]/label[1]"><a href="/article/slideshow.action?uri=info:doi/10.1371/journal.pgen.0030197&imageURI=info:doi/10.1371/journal.pgen.0030197.t002" onclick="window.open(this.href,'plosSlideshow','directories=no,location=no,menubar=no,resizable=yes,status=no,scrollbars=yes,toolbar=no,height=600,width=850');return false;"><span xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/table-wrap[1]/label[1]">Table 2. </span></a></strong></p><p xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/table-wrap[1]/caption[1]/p[1]">Over-represented GO Terms among Lineage-Specific Families</p> <span xpathLocation="noSelect">doi:10.1371/journal.pgen.0030197.t002</span><div class="clearer"></div></div><p xpathLocation="/article[1]/body[1]/sec[2]/sec[6]/p[5]">As noted above, previous work has found that some lineage-specific <span class="genus-species">D. melanogaster</span> genes appear to be incorrect annotations [<a href="#pgen-0030197-b065">65</a>]. As the sequencing of multiple <i>Drosophila</i> genomes affords a much deeper comparative genomic dataset with which to address this question, we attempted to identify additional gene models from the <span class="genus-species">D. melanogaster</span> genome that have little evolutionary or functional support (see also [<a href="#pgen-0030197-b033">33</a>]). We concentrated on genes found within single-gene, single-species families (“annotation artifacts”). Of the 1,074 genes (families) we previously called annotation artifacts in <span class="genus-species">D. melanogaster</span>, 716 were found to be RNA genes upon closer inspection. Of the 358 remaining genes, 94 had no EST support and no tBLASTX match in the <span class="genus-species">D. simulans</span> genome (<a href="#pgen-0030197-sg004">Figure S4</a>; <a href="#pgen-0030197-st004">Table S4</a>). Many of these genes are quite short (average length of 319 amino acids), and are highly likely to be incorrectly annotated <span class="genus-species">D. melanogaster</span> genes. A total of 34 of these genes were also marked as bad annotations using other methods [<a href="#pgen-0030197-b033">33</a>]. Finally, we found 15 cases where the <span class="genus-species">D. melanogaster</span> genes that we called annotation artifacts were: syntenic with a similar <span class="genus-species">D. simulans</span> gene; had EST matches in GenBank; had <i>d</i><sub>S</sub> < 0.20 to the matching <span class="genus-species">D. simulans</span> gene; and where the family containing the <span class="genus-species">D. simulans</span> homolog had more copies in <span class="genus-species">D. simulans</span> than <span class="genus-species">D. melanogaster</span> (suggesting that the “annotation artifact” might explain an apparent loss in <span class="genus-species">D. melanogaster</span> if included in this family). These genes have an average <i>d</i><sub>N</sub> = 0.041, compared to the average across all genes between these two species of <i>d</i><sub>N</sub> = 0.016 [<a href="#pgen-0030197-b035">35</a>], and four have <i>d</i><sub>N</sub>/<i>d</i><sub>S</sub> > 1. Though we have called these genes annotation artifacts, it appears more likely that they are simply extremely rapidly evolving genes.</p> <h4 xpathLocation="/article[1]/body[1]/sec[2]/sec[7]/title[1]">Conclusions</h4> <p xpathLocation="/article[1]/body[1]/sec[2]/sec[7]/p[1]">By studying the gain and loss of genes, we hope to better understand the forces that shape morphological, physiological, and metabolic differences among species. We have shown here that even among 12 closely related <i>Drosophila</i>, there have been a large number of gene gains and losses along each lineage, in proteins involved in a wide range of biological functions. There has also been the gain and loss of whole gene families, at approximately equal rates across the <i>Drosophila</i>. In the past 5 million years of <span class="genus-species">D. melanogaster</span> evolution, there has been the gain of at least 94 duplicated genes, some of these likely evolving by adaptive natural selection. In addition to garnering novel insights into genome evolution, studies of the gene complements of multiple <i>Drosophila</i> species can help to annotate the <span class="genus-species">D. melanogaster</span> genome. As demonstrated here, such analyses can improve the <span class="genus-species">D. melanogaster</span> annotation by either adding or removing genes from this genome. Though comparative genome sequencing has revealed vast similarities in the total number of genes among taxa, this similarity hides enormous complexities in the identity and number of constituent proteins.</p> </div> <div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" id="section3" xpathLocation="/article[1]/body[1]/sec[3]"><a id="s3" name="s3" toc="s3" title="Materials and Methods"></a><h3 xpathLocation="noSelect">Materials and Methods <a href="#top">Top</a></h3> <h4 xpathLocation="/article[1]/body[1]/sec[3]/sec[1]/title[1]">Data.</h4> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[1]/p[1]">Gene models across all 12 species are taken from the consensus set defined by the <i>Drosophila</i> Genome Sequencing and Analysis Consortium [<a href="#pgen-0030197-b032">32</a>,<a href="#pgen-0030197-b033">33</a>]. Gene families were assembled by a modified reciprocal BLAST method (FRB, [<a href="#pgen-0030197-b032">32</a>]). Briefly, FRB proceeds by first performing all-by-all comparisons between the 12 genomes using BLASTP. Rather than taking only the top hit as the putative ortholog—as is done in most reciprocal BLAST methods—FRB considers proteins to be in the same “rank” if the absolute difference in successive BLAST E-values is less than two orders of magnitude (i.e., a difference in score of 100). This E-value threshold was changed when the data were reclustered to either a difference in E-values of 10 or a difference of 1,000. Genes in the same rank are potentially homologous, and the clustering step of FRB traverses the graph of pairwise relationships to find the maximally connected clusters that are disjoint from one another while discarding nonreciprocal relationships. These clusters include both orthologs and paralogs and are the gene families used in our analyses (description of FRB courtesy of V. Iyer).</p> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[1]/p[2]">In total this method identified 50,042 gene families in all 12 species, including 223,963 genes. After filtering out gene models predicted to be derived from transposable elements, the total numbers were reduced to 38,634 families containing 188,868 genes. We determined whether families were present in the MRCA, and if not, on which branch the family had originated. A family was defined as being present in the MRCA (with at least one gene copy), if it was found in at least one species of both the <i>Drosophila</i> (<span class="genus-species">D. virilis</span>, <span class="genus-species">D. mojavensis</span>, and <i>D. grimshawi</i>) and <i>Sophophora</i> (<i>D. willistoni, D. persimilis, D. pseudobscura, D. ananassae, D. erecta, D. yakuba, D. melanogaster, D. sechellia,</i> and <i>D. simulans</i>) subgenera. The branch on which families originated was determined by parsimony rules: if leaf branches share a family, the MRCA of those branches is regarded as the point of origin of the family. These are the same criteria by which losses of families were mapped onto the tree.</p> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[1]/p[3]">Using these rules, we found 23,070 families that consisted of a single gene and that appeared to have evolved on a terminal lineage (i.e., they are found in only a single species). These single-gene families were regarded as artifacts of the annotation process, and were removed from further analysis. We also found 4,129 families that arose after the split between the main two subgenera, but that were either found in multiple species or had multiple copies in one species. Since our likelihood analysis assumes that there is at least one ancestral gene in the MRCA (see below), we separated these families from the likelihood analysis. This left 11,435 families with at least two genes across the both subgenera. Close examination of the data revealed one family (Dfam8) predicted to be made up of >85% transposable elements. As it seems likely that the remaining ~15% of gene in this family are also transposable elements, this family was removed from all downstream analyses, leaving 11,434 families for the final dataset used in the likelihood analysis.</p> <h4 xpathLocation="/article[1]/body[1]/sec[3]/sec[2]/title[1]">Likelihood analysis of gene gain and loss.</h4> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[2]/p[1]">To estimate the average gene gain/loss rate and to identify gene families that have undergone significant size changes, we applied the probabilistic framework developed by Hahn et al. [<a href="#pgen-0030197-b023">23</a>]. By using a stochastic birth and death model for the gene gain and loss across species and a probabilistic graphical model for the dependence relationship between branches of the phylogeny, this framework can infer the rate and direction of the change in gene family size. Assuming that all genes have equal probability λ of gain (birth) and loss (death), the conditional probability of going from an initial number of genes <i>X</i><sub>0</sub> = <i>s</i> to size <i>c</i> during time <i>t</i>, is given as, <br><a name="pgen-0030197-e001" id="pgen-0030197-e001"></a><span class="equation"><img src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.e001&representation=PNG"></span><br>where, <span class="capture-id" id="pgen-0030197-ex001"><img src="/article/fetchObject.action?uri=info:doi/10.1371/journal.pgen.0030197.ex001&representation=PNG" border="0"></span> . Since <i>X</i><sub>0</sub> = 0 will result in a probability of zero for birth and death, we restrict our analysis to families in which <i>X</i><sub>0</sub> > 0. That means we exclude lineage-specific families from our likelihood analysis. A total of 11,434 families including 148,326 genes were analyzed. The phylogeny for the analysis was based on the tree found in [<a href="#pgen-0030197-b032">32</a>]. </p> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[2]/p[2]">The rate of gene gain and loss, λ, was estimated by an expectation-maximization algorithm that maximizes the sum of the log-likelihoods of each family. The likelihoods we want to maximize are the conditional likelihood of the observed family sizes given the root size. The ancestral family sizes at internal nodes are computed by averaging over all possible assignments during this maximization. For further details see Hahn et al. [<a href="#pgen-0030197-b023">23</a>] and De Bie et al. [<a href="#pgen-0030197-b039">39</a>]. We estimated three different models with varying numbers of parameters. A model with one global λ gave us a consistent result, while a model with 22 λ-parameters (one for each branch of the phylogeny) failed to converge to a single, consistent global maximum. On the basis of the best results for the 22-p model, we categorized branches into three rate categories: fast (>0.001), medium (0.001–0.0001), and slow (<0.0001).</p> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[2]/p[3]">To test for biases in parameter estimation, we used the estimated rate for the 1-p model (λ = 0.0012) to simulate data over the <i>Drosophila</i> phylogeny for each of the 11,434 gene families. Each of 1,000 simulations starts by setting the root sizes for all 11,434 families equal to the maximum likelihood size estimated from the dataset, and then evolving these families over the tree according the birth–death probability model described above. For each of the 1,000 simulated datasets we then estimate λ-values under both the 1-p and 3-p models. As the data were generated under a 1-p model, these simulations act as a null hypothesis against which results from the 3-p model can be compared.</p> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[2]/p[4]">To calculate the number of gene gains and losses on each branch of the tree, we compared the sizes of all parent–daughter node pairs (using the maximum likelihood ancestral gene family sizes). The difference in size between these two values was inferred to be the number of genes gained or lost: larger daughter sizes imply gene gains, while smaller daughter sizes imply gene losses. These numbers are minimum estimates, as gains and losses in the same family will result in fewer observable events. Total gains and losses were summed across all 11,434 families on all lineages.</p> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[2]/p[5]">Our likelihood approach also allows us to set up a null hypothesis against which we can compare the rate of evolution of individual gene families. Using the maximum likelihood parameters of the 3-p model, we ran Monte Carlo simulations to test for significant rate accelerations in all 11,434 families [<a href="#pgen-0030197-b023">23</a>]. Using <i>p</i> < 0.0001, we expect there to be approximately one significant result by chance; the observation of 342 families with lower <i>p</i>-values implies a false discovery rate of 0.003%. To identify the branch of the <i>Drosophila</i> tree with the most unlikely amount of change for these 342 families, we calculated the exact <i>p</i>-values for transitions over every branch (the “Viterbi” method in [<a href="#pgen-0030197-b039">39</a>]). We called individual branches significant at <i>p</i> < 0.005.</p> <h4 xpathLocation="/article[1]/body[1]/sec[3]/sec[3]/title[1]">Reconciling gene trees and species trees.</h4> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[3]/p[1]">Alignments among proteins in each of the gene families were generated by MUSCLE [<a href="#pgen-0030197-b070">70</a>]. A neighbor-joining tree was built for each family on the basis of the alignment and JTT protein distances using PHYLIP [<a href="#pgen-0030197-b071">71</a>]. We were only able to construct gene trees for 11,390 of the 11,434 families (PHYLIP could not handle trees with more than ~250 genes). Using the rooted species tree, we compared each gene tree with the species tree to map each node in the gene tree as either a speciation or a duplication event. With this information we can bound the date of each gene duplication to the resolution of each speciation event. The reconciliation of gene tree and species tree was done using the software NOTUNG [<a href="#pgen-0030197-b027">27</a>] with 100% bootstrap cutoffs to collapse poorly supported topologies. By inferring the placement of duplications, we were able to estimate the number of gains on each branch of the species tree. Nodes with three or more descendant lineages are prone to overestimate the number of duplications on the branches ancestral to them [<a href="#pgen-0030197-b038">38</a>]; we therefore excluded branches 2, 3, 5, 6, 8, and 9 from comparisons between the likelihood and tree reconciliation methods.</p> <h4 xpathLocation="/article[1]/body[1]/sec[3]/sec[4]/title[1]">Positive selection on nucleotide sequences.</h4> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[4]/p[1]">We asked whether there was evidence for positive selection on the nucleotide sequences of <i>D. melanogaster</i>-specific duplicates using the ratio of nonsynonymous (<i>d</i><sub>N</sub>) to synonymous (<i>d</i><sub>S</sub>) substitutions per site. If <i>d</i><sub>N</sub>/<i>d</i><sub>S</sub> > 1, then adaptive natural selection must be acting to fix nonsynonymous mutations. We compared the likelihood of models with no positive selection (M1a) to the likelihood of models with positive selection (M2a) in the program PAML [<a href="#pgen-0030197-b072">72</a>]. The M1a/M2a comparison was used rather than more complex branch-site models so that the same test could be used on all <span class="genus-species">D. melanogaster</span>-specific duplicates: M1a/M2a does not require an outgroup to detect positive selection along the <i>melanogaster</i> lineage. The likelihood ratio test conservatively assumes 2 df because of boundary effects in parameter estimation [<a href="#pgen-0030197-b073">73</a>].</p> <h4 xpathLocation="/article[1]/body[1]/sec[3]/sec[5]/title[1]">Annotation of gene families.</h4> <p xpathLocation="/article[1]/body[1]/sec[3]/sec[5]/p[1]">The basic annotations for each gene family were based on the FlyBase GO term database (FlyBase 4.3, <a href="http://flybase.bio.indiana.edu/">http://flybase.bio.indiana.edu/</a>). We searched this database using the <span class="genus-species">D. melanogaster</span> proteins. The most common GO terms in cellular component/function/process were identified, and a consensus set of terms was used if genes in the same family had different GO terms associated with them. If no annotation was retrieved for any of the genes in a family, we searched Pfam for matching protein domains. In total we were able to annotate 9,752 of the families, 7,460 via FlyBase and 2,292 via Pfam. The program GOstat [<a href="#pgen-0030197-b074">74</a>] was used to find over-represented GO terms at each level in the GO hierarchy.</p> </div> <div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" id="section4" xpathLocation="/article[1]/body[1]/sec[4]"><a id="s4" name="s4" toc="s4" title="Supporting Information"></a><h3 xpathLocation="noSelect">Supporting Information <a href="#top">Top</a></h3><a name="pgen-0030197-sg001" id="pgen-0030197-sg001"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.sg001">Figure S1. </a>Gene Gain and Loss Using Tree Reconciliation Methods</strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[1]/caption[1]/p[1]">On each branch of the tree the number of gene gains/losses inferred by gene tree/species tree reconciliation is given. The number of gene losses using this method is highly biased [<a href="#pgen-0030197-b038">38</a>].</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[1]/caption[1]/p[2]">(59 KB TIF)</p> <a name="pgen-0030197-sg002" id="pgen-0030197-sg002"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.sg002">Figure S2. </a>Extinctions in <span class="genus-species">D. melanogaster</span></strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[2]/caption[1]/p[1]">The Venn diagram summarizes the results of searching for 292 extinct genes in <span class="genus-species">D. melanogaster</span> using <span class="genus-species">D. simulans</span> homologs. Genes predicted to be pseudogenes in each category are not shown. D.mel, <span class="genus-species">D. melanogaster</span>; D.sim, <span class="genus-species">D. simulans</span>; nr db, NCBI nonredundant database.</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[2]/caption[1]/p[2]">(90 KB TIF)</p> <a name="pgen-0030197-sg003" id="pgen-0030197-sg003"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.sg003">Figure S3. </a>GO Hierarchy for Significant Terms</strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[3]/caption[1]/p[1]">GO terms significantly over-represented among lineage-specific families are highlighted in yellow.</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[3]/caption[1]/p[2]">(6.1 MB TIF)</p> <a name="pgen-0030197-sg004" id="pgen-0030197-sg004"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.sg004">Figure S4. </a>Annotation Artifacts in <span class="genus-species">D. melanogaster</span></strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[4]/caption[1]/p[1]">The Venn diagram summarizes the results of searching for the 1,074 genes in <span class="genus-species">D. melanogaster</span> that were in families by themselves against the <span class="genus-species">D. simulans</span> genome. D.mel, <span class="genus-species">D. melanogaster</span>; D.sim, <i>D. simulans.</i></p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[4]/caption[1]/p[2]">(66 KB TIF)</p> <a name="pgen-0030197-st001" id="pgen-0030197-st001"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.st001">Table S1. </a>Rapidly Evolving Gene Families in <i>Drosophila</i></strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[5]/caption[1]/p[1]">The tree-wide <i>p</i>-values are given, as well as the individual <i>p-</i>values for changes along each branch of the tree, the inferred size of each family at bottom of each branch, and the inferred amount of change on each branch.</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[5]/caption[1]/p[2]">(271 KB XLS)</p> <a name="pgen-0030197-st002" id="pgen-0030197-st002"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.st002">Table S2. </a>Rapidly Evolving Gene Families in <span class="genus-species">D. melanogaster</span></strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[6]/caption[1]/p[1]">The current size of the families and the inferred number of changes since the split from the <i>simulans</i>/<i>sechellia</i> ancestor are given.</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[6]/caption[1]/p[2]">(23 KB XLS)</p> <a name="pgen-0030197-st003" id="pgen-0030197-st003"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.st003">Table S3. </a>Newly Predicted Genes in <span class="genus-species">D. melanogaster</span></strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[7]/caption[1]/p[1]">Genes overlapping with new predictions from Stark et al. [<a href="#pgen-0030197-b033">33</a>] are listed with their CONGO IDs, while genes overlapping with third-party annotations from Hild et al. [<a href="#pgen-0030197-b063">63</a>] are labeled “TPA.” NCBI identifiers for the EST matches to predicted genes, GeneWise prediction scores, and the <span class="genus-species">D. simulans</span> putative homolog IDs are also given.</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[7]/caption[1]/p[2]">(32 KB XLS)</p> <a name="pgen-0030197-st004" id="pgen-0030197-st004"></a><p><strong xPathLocation="noSelect"><a href="/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.0030197.st004">Table S4. </a>Genes from <span class="genus-species">D. melanogaster</span> Predicted to Be Incorrect Annotations</strong></p><p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[8]/caption[1]/p[1]">Genes overlapping with predictions of incorrect annotations from Stark et al. [<a href="#pgen-0030197-b033">33</a>] are listed with their CG number.</p> <p xpathLocation="/article[1]/body[1]/sec[4]/supplementary-material[8]/caption[1]/p[2]">(25 KB XLS)</p> <h4 xpathLocation="/article[1]/body[1]/sec[4]/sec[1]/title[1]">Accession Numbers</h4> <p xpathLocation="/article[1]/body[1]/sec[4]/sec[1]/p[1]">The FlyBase (<a href="http://flybase.bio.indiana.edu/">http://flybase.bio.indiana.edu/</a>) accession number for <i>CG6908</i> is FBgn0037936.</p> </div> <div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" xpathLocation="noSelect"><a id="ack" name="ack" toc="ack" title="Acknowledgments"></a><h3 xpathLocation="noSelect">Acknowledgments <a href="#top">Top</a></h3> <p xpathLocation="/article[1]/back[1]/ack[1]/p[1]">We thank R. Kwok for assistance gathering and analyzing the data; J. Costello for help with the analysis of gene ontologies; J. Demuth, T. Turner, and D. Begun for comments on the manuscript; D. Pollard and V. Iyer for answering many questions about the genome annotations; and A. Clark, M. Eisen, M. Kellis, M. Lin, T. Kauffman, W. Gelbart, D. Smith, and the rest of the consortium for many of the accompanying analyses that made this work possible. G. McVean and four anonymous reviewers also gave comments that substantially improved the manuscript.</p> </div><div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" class="contributions"><a id="authcontrib" name="authcontrib" toc="authcontrib" title="Author Contributions"></a><h3 xpathLocation="noSelect">Author Contributions <a href="#top">Top</a></h3><p xpathLocation="noSelect"><span class="capture-id"> MWH and MVH conceived and designed the experiments. MVH and SGH performed the experiments. All authors analyzed the data. MWH wrote the paper.</span></p></div><div xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:aml="http://topazproject.org/aml/" xpathLocation="noSelect"><a id="references" name="references" toc="references" title="References"></a><h3 xpathLocation="noSelect">References <a href="#top">Top</a></h3><ol class="references" xpathLocation="noSelect"><li xpathLocation="noSelect"><a name="pgen-0030197-b001" id="pgen-0030197-b001"></a><span class="authors">Tatusov RL, Koonin EV, Lipman DJ</span> (1997) A genomic perspective on protein families. Science 278: 631–637. <a class="find" href="/article/findArticle.action?author=Tatusov&title=A genomic perspective on protein families."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b002" id="pgen-0030197-b002"></a><span class="authors">Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, et al. </span> (2000) Comparative genomics of the eukaryotes. Science 287: 2204–2215. <a class="find" href="/article/findArticle.action?author=Rubin&title=Comparative genomics of the eukaryotes."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b003" id="pgen-0030197-b003"></a><span class="authors">Roelofs J, Van Haastert PJM</span> (2001) Genes lost during evolution. Nature 411: 1013–1014. <a class="find" href="/article/findArticle.action?author=Roelofs&title=Genes lost during evolution."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b004" id="pgen-0030197-b004"></a><span class="authors">Hughes AL, Friedman R</span> (2004) Shedding genomic ballast: extensive parallel loss of ancestral gene families in animals. J Mol Evol 59: 827–833. <a class="find" href="/article/findArticle.action?author=Hughes&title=Shedding genomic ballast: extensive parallel loss of ancestral gene families in animals."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b005" id="pgen-0030197-b005"></a><span class="authors">Aravind L, Watanabe H, Lipman DJ, Koonin EV</span> (2000) Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci U S A 97: 11319–11324. <a class="find" href="/article/findArticle.action?author=Aravind&title=Lineage-specific loss and divergence of functionally linked genes in eukaryotes."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b006" id="pgen-0030197-b006"></a><span class="authors">McLysaght A, Baldi PF, Gaut BS</span> (2003) Extensive gene gain associated with adaptive evolution of poxviruses. Proc Natl Acad Sci U S A 100: 15655–15660. <a class="find" href="/article/findArticle.action?author=McLysaght&title=Extensive gene gain associated with adaptive evolution of poxviruses."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b007" id="pgen-0030197-b007"></a><span class="authors">Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, et al. </span> (2004) Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biology 2: e207 doi: <a href="http://dx.doi.org/10.1371/journal.pbio.0020207">10.1371/journal.pbio.0020207</a>. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b008" id="pgen-0030197-b008"></a><span class="authors">Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW</span> (2006) The evolution of mammalian gene families. PLoS ONE 1: e85 doi: <a href="http://dx.doi.org/10.1371/journal.pone.0000085">10.1371/journal.pone.0000085</a>. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b009" id="pgen-0030197-b009"></a><span class="authors">Tamura K, Subramanian S, Kumar S</span> (2004) Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol Biol Evol 21: 36–44. <a class="find" href="/article/findArticle.action?author=Tamura&title=Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b010" id="pgen-0030197-b010"></a><span class="authors">Ohno S</span> (1970) Evolution by gene duplication. Berlin: Springer-Verlag. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b011" id="pgen-0030197-b011"></a><span class="authors">Hughes AL</span> (1994) The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond B Biol Sci 256: 119–124. <a class="find" href="/article/findArticle.action?author=Hughes&title=The evolution of functionally novel proteins after gene duplication."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b012" id="pgen-0030197-b012"></a><span class="authors">Force A, Lynch M, Pickett FB, Amores A, Yan Y-l, et al. </span> (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 1531–1545. <a class="find" href="/article/findArticle.action?author=Force&title=Preservation of duplicate genes by complementary, degenerative mutations."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b013" id="pgen-0030197-b013"></a><span class="authors">Robin C, Russell RJ, Medveczky KM, Oakeshott J</span> (1996) Duplication and divergence of the genes of the alpha-esterase cluster of <span class="genus-species">Drosophila melanogaster</span>. J Mol Evol 43: 241–252. <a class="find" href="/article/findArticle.action?author=Robin&title=Duplication and divergence of the genes of the alpha-esterase cluster of Drosophila melanogaster."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b014" id="pgen-0030197-b014"></a><span class="authors">Ting C-T, Tsaur S-C, Sun S, Browne WE, Chen Y-C, et al. </span> (2004) Gene duplication and speciation in <i>Drosophila</i>: evidence from the <i>Odysseus</i> locus. Proc Natl Acad Sci U S A 101: 12232–12235. <a class="find" href="/article/findArticle.action?author=Ting&title=Gene duplication and speciation in Drosophila: evidence from the Odysseus locus."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b015" id="pgen-0030197-b015"></a><span class="authors">Holloway AK, Begun DJ</span> (2004) Molecular evolution and population genetics of duplicated accessory gland protein genes in <i>Drosophila</i>. Mol Biol Evol 21: 1625–1628. <a class="find" href="/article/findArticle.action?author=Holloway&title=Molecular evolution and population genetics of duplicated accessory gland protein genes in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b016" id="pgen-0030197-b016"></a><span class="authors">Quesada H, Sebastián ER-O, Montserrat A</span> (2005) Birth-and-death evolution of the cecropin multigene family in Drosophila. J Mol Evol 60: 1–11. <a class="find" href="/article/findArticle.action?author=Quesada&title=Birth-and-death evolution of the cecropin multigene family in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b017" id="pgen-0030197-b017"></a><span class="authors">Oakley TH, Ostman B, Wilson ACV</span> (2006) Repression and loss of gene expression outpaces activation and gain in recently duplicated fly genes. Proc Natl Acad Sci U S A 103: 11637–11641. <a class="find" href="/article/findArticle.action?author=Oakley&title=Repression and loss of gene expression outpaces activation and gain in recently duplicated fly genes."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b018" id="pgen-0030197-b018"></a><span class="authors">Greenberg AJ, Moran JR, Fang S, Wu C-I</span> (2006) Adaptive loss of an old duplicated gene during incipient speciation. Mol Biol Evol 23: 401–410. <a class="find" href="/article/findArticle.action?author=Greenberg&title=Adaptive loss of an old duplicated gene during incipient speciation."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b019" id="pgen-0030197-b019"></a><span class="authors">Olson MV</span> (1999) When less is more: gene loss as an engine of evolutionary change. Am J Hum Genet 64: 18–23. <a class="find" href="/article/findArticle.action?author=Olson&title=When less is more: gene loss as an engine of evolutionary change."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b020" id="pgen-0030197-b020"></a><span class="authors">Long M, Langley CH</span> (1993) Natural selection and the origin of <i>jingwei</i>, a chimeric processed functional gene in <i>Drosophila</i>. Science 260: 91–95. <a class="find" href="/article/findArticle.action?author=Long&title=Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b021" id="pgen-0030197-b021"></a><span class="authors">Jones CD, Begun DJ</span> (2005) Parallel evolution of chimeric fusion genes. Proc Natl Acad Sci U S A 102: 11373–11378. <a class="find" href="/article/findArticle.action?author=Jones&title=Parallel evolution of chimeric fusion genes."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b022" id="pgen-0030197-b022"></a><span class="authors">Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ</span> (2006) Novel genes derived from noncoding DNA in <span class="genus-species">Drosophila melanogaster</span> are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A 103: 9935–9939. <a class="find" href="/article/findArticle.action?author=Levine&title=Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b023" id="pgen-0030197-b023"></a><span class="authors">Hahn MW, De Bie T, Stajich JE, Nguyen C, Cristianini N</span> (2005) Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res 15: 1153–1160. <a class="find" href="/article/findArticle.action?author=Hahn&title=Estimating the tempo and mode of gene family evolution from comparative genomic data."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b024" id="pgen-0030197-b024"></a><span class="authors">Goodman M, Czelusniak J, Moore GW, Romero-Herrera AE, Matsuda G</span> (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool 28: 132–163. <a class="find" href="/article/findArticle.action?author=Goodman&title=Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b025" id="pgen-0030197-b025"></a><span class="authors">Page RD</span> (1998) GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 14: 819–820. <a class="find" href="/article/findArticle.action?author=Page&title=GeneTree: comparing gene and species phylogenies using reconciled trees."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b026" id="pgen-0030197-b026"></a><span class="authors">Zmasek CM, Eddy SR</span> (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17: 821–828. <a class="find" href="/article/findArticle.action?author=Zmasek&title=A simple algorithm to infer gene duplication and speciation events on a gene tree."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b027" id="pgen-0030197-b027"></a><span class="authors">Durand D, Halldorsson BV, Vernot B</span> (2005) A hybrid micro-macroevolutionary approach to gene tree reconstruction. J Comput Biol 13: 320–335. <a class="find" href="/article/findArticle.action?author=Durand&title=A hybrid micro-macroevolutionary approach to gene tree reconstruction."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b028" id="pgen-0030197-b028"></a><span class="authors">Lynch M, Conery JS</span> (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155. <a class="find" href="/article/findArticle.action?author=Lynch&title=The evolutionary fate and consequences of duplicate genes."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b029" id="pgen-0030197-b029"></a><span class="authors">Gu ZL, Cavalcanti A, Chen F-C, Bouman P, Li W-H</span> (2002) Extent of gene duplication in the genomes of <i>Drosophila</i>, nematode, and yeast. Mol Biol Evol 19: 256–262. <a class="find" href="/article/findArticle.action?author=Gu&title=Extent of gene duplication in the genomes of Drosophila, nematode, and yeast."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b030" id="pgen-0030197-b030"></a><span class="authors">Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, et al. </span> (2000) The genome sequence of <span class="genus-species">Drosophila melanogaster</span>. Science 287: 2185–2195. <a class="find" href="/article/findArticle.action?author=Adams&title=The genome sequence of Drosophila melanogaster."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b031" id="pgen-0030197-b031"></a><span class="authors">Grumbling G, Strelets V, Consortium TF</span> (2006) FlyBase: anatomical data, images and queries. Nucleic Acids Res 34: D484–D488. <a class="find" href="/article/findArticle.action?author=Grumbling&title=FlyBase: anatomical data, images and queries."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b032" id="pgen-0030197-b032"></a><span class="authors"><i>Drosophila</i> Comparative Genome Sequencing and Analysis Consortium</span> (2007) Evolution of genes and genomes in the context of the <i>Drosophila</i> phylogeny. Nature. In press. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b033" id="pgen-0030197-b033"></a><span class="authors">Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, et al. </span> (2007) Discovery of functional elements in 12 <i>Drosophila</i> genomes using evolutionary signatures. Nature. In press. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b034" id="pgen-0030197-b034"></a><span class="authors">Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, et al. </span> (2002) The genome sequence of the malaria mosquito <span class="genus-species">Anopheles gambiae</span>. Science 298: 129–149. <a class="find" href="/article/findArticle.action?author=Holt&title=The genome sequence of the malaria mosquito Anopheles gambiae."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b035" id="pgen-0030197-b035"></a><span class="authors">Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh Y-P, et al. </span> (2007) Population genomics: whole-genome analysis of polymorphism and divergence in <span class="genus-species">Drosophila simulans</span>. PLoS Biology. In press. <a class="find" href="/article/findArticle.action?author=Begun&title=Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b036" id="pgen-0030197-b036"></a><span class="authors">McBride CS, Arguello JR</span> (2007) Five <i>Drosophila</i> genomes reveal non-neutral evolution and the signature of host specialization in the chemoreceptor superfamily. Genetics. In press. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b037" id="pgen-0030197-b037"></a><span class="authors">Bai Y, Casola C, Feschotte C, Betran E</span> (2007) Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in <i>Drosophila</i>. Genome Biol 8: R11. <a class="find" href="/article/findArticle.action?author=Bai&title=Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b038" id="pgen-0030197-b038"></a><span class="authors">Hahn MW</span> (2007) Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol 8: R141. <a class="find" href="/article/findArticle.action?author=Hahn&title=Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b039" id="pgen-0030197-b039"></a><span class="authors">De Bie T, Demuth JP, Cristianini N, Hahn MW</span> (2006) CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22: 1269–1271. <a class="find" href="/article/findArticle.action?author=De Bie&title=CAFE: a computational tool for the study of gene family evolution."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b040" id="pgen-0030197-b040"></a><span class="authors">Ho SYW, Phillips MJ, Cooper A, Drummond AJ</span> (2005) Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol 22: 1561–1568. <a class="find" href="/article/findArticle.action?author=Ho&title=Time dependency of molecular rate estimates and systematic overestimation of recent divergence times."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b041" id="pgen-0030197-b041"></a><span class="authors">Woodhams M</span> (2006) Can deleterious mutations explain the time dependency of molecular rate estimates? Mol Biol Evol 23: 2271–2273. <a class="find" href="/article/findArticle.action?author=Woodhams&title=Can deleterious mutations explain the time dependency of molecular rate estimates?"> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b042" id="pgen-0030197-b042"></a><span class="authors">Sebat J, Lakshmi B, Troge J, Alexander J, Young J, et al. </span> (2004) Large-scale copy number polymorphism in the human genome. Science 305: 525–528. <a class="find" href="/article/findArticle.action?author=Sebat&title=Large-scale copy number polymorphism in the human genome."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b043" id="pgen-0030197-b043"></a><span class="authors">Pollard D, Iyer VN, Moses AM, Eisen MB</span> (2006) Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genetics 2: e173 doi: <a href="http://dx.doi.org/10.1371/journal.pgen.0020173">10.1371/journal.pgen.0020173</a>. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b044" id="pgen-0030197-b044"></a><span class="authors">Masly JP, Jones CD, Noor MAF, Locke J, Orr HA</span> (2006) Gene transposition as a cause of hybrid sterility in Drosophila. Science 313: 1448–1450. <a class="find" href="/article/findArticle.action?author=Masly&title=Gene transposition as a cause of hybrid sterility in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b045" id="pgen-0030197-b045"></a><span class="authors">Nam J, Nei M</span> (2005) Evolutionary change of the numbers of homeobox genes in bilateral animals. Mol Biol Evol 22: 2386–2394. <a class="find" href="/article/findArticle.action?author=Nam&title=Evolutionary change of the numbers of homeobox genes in bilateral animals."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b046" id="pgen-0030197-b046"></a><span class="authors">Blomme T, Vandepoele K, De Bodt S, Simillion C, Maere S, et al. </span> (2006) The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol 7: R43. <a class="find" href="/article/findArticle.action?author=Blomme&title=The gain and loss of genes during 600 million years of vertebrate evolution."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b047" id="pgen-0030197-b047"></a><span class="authors">Saitou N, Nei M</span> (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425. <a class="find" href="/article/findArticle.action?author=Saitou&title=The neighbor-joining method: a new method for reconstructing phylogenetic trees."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b048" id="pgen-0030197-b048"></a><span class="authors">Francino MP</span> (2005) An adaptive radiation model for the origin of new gene functions. Nat Genet 37: 573–578. <a class="find" href="/article/findArticle.action?author=Francino&title=An adaptive radiation model for the origin of new gene functions."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b049" id="pgen-0030197-b049"></a><span class="authors">Hahn MW, Demuth JP, Han S-G</span> (2007) Accelerated rate of gene gain and loss in primates. Genetics. In press. </li><li xpathLocation="noSelect"><a name="pgen-0030197-b050" id="pgen-0030197-b050"></a><span class="authors">Zdobnov EM, von Mering C, Letunic I, Torrents D, Suyama M, et al. </span> (2002) Comparative genome and proteome analysis of <span class="genus-species">Anopheles gambiae</span> and <span class="genus-species">Drosophila melanogaster</span>. Science 298: 149–159. <a class="find" href="/article/findArticle.action?author=Zdobnov&title=Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b051" id="pgen-0030197-b051"></a><span class="authors">Giniger E, Tietje K, Jan LY, Jan YN</span> (1994) lola encodes a putative transcription factor required for axon growth and guidance in Drosophila. Development 120: 1385–1398. <a class="find" href="/article/findArticle.action?author=Giniger&title=lola encodes a putative transcription factor required for axon growth and guidance in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b052" id="pgen-0030197-b052"></a><span class="authors">Hurst L</span> (1992) Is <i>Stellate</i> a relict meiotic driver? Genetics 130: 229–230. <a class="find" href="/article/findArticle.action?author=Hurst&title=Is Stellate a relict meiotic driver?"> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b053" id="pgen-0030197-b053"></a><span class="authors">Hurst L</span> (1996) Further evidence consistent with <i>Stellate</i>'s involvement in meiotic drive. Genetics 142: 641–643. <a class="find" href="/article/findArticle.action?author=Hurst&title=Further evidence consistent with Stellate's involvement in meiotic drive."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b054" id="pgen-0030197-b054"></a><span class="authors">Livak KJ</span> (1984) Organization and mapping of a sequence on the <span class="genus-species">Drosophila melanogaster</span> X and Y chromosomes that is transcribed during spermatogenesis. Genetics 107: 611–634. <a class="find" href="/article/findArticle.action?author=Livak&title=Organization and mapping of a sequence on the Drosophila melanogaster X and Y chromosomes that is transcribed during spermatogenesis."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b055" id="pgen-0030197-b055"></a><span class="authors">Nurminsky DI, Nurminskaya MV, De Aguiar D, Hartl DL</span> (1998) Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature 396: 572–575. <a class="find" href="/article/findArticle.action?author=Nurminsky&title=Selective sweep of a newly evolved sperm-specific gene in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b056" id="pgen-0030197-b056"></a><span class="authors">Nurminsky D, De Aguiar D, Bustamante CD, Hartl DL</span> (2001) Chromosomal effects of rapid gene evolution in <span class="genus-species">Drosophila melanogaster</span>. Science 291: 128–130. <a class="find" href="/article/findArticle.action?author=Nurminsky&title=Chromosomal effects of rapid gene evolution in Drosophila melanogaster."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b057" id="pgen-0030197-b057"></a><span class="authors">Ranz JM, Ponce AR, Hartl DL, Nurminsky D</span> (2003) Origin and evolution of a new gene expressed in the Drosophila sperm axoneme. Genetica 118: 233–244. <a class="find" href="/article/findArticle.action?author=Ranz&title=Origin and evolution of a new gene expressed in the Drosophila sperm axoneme."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b058" id="pgen-0030197-b058"></a><span class="authors">Lung O, Tram U, Finnerty CM, Eipper-Mains MA, Kalb JM, et al. </span> (2002) The <span class="genus-species">Drosophila melanogaster</span> seminal fluid protein Acp62F is a protease inhibitor that is toxic upon ectopic expression. Genetics 160: 211–224. <a class="find" href="/article/findArticle.action?author=Lung&title=The Drosophila melanogaster seminal fluid protein Acp62F is a protease inhibitor that is toxic upon ectopic expression."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b059" id="pgen-0030197-b059"></a><span class="authors">Friedman R, Hughes AL</span> (2007) Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods. Mol Phylogenet Evol 42: 388–393. <a class="find" href="/article/findArticle.action?author=Friedman&title=Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b060" id="pgen-0030197-b060"></a><span class="authors">Anisimova M, Nielsen R, Yang Z</span> (2003) Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164: 1229–1236. <a class="find" href="/article/findArticle.action?author=Anisimova&title=Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b061" id="pgen-0030197-b061"></a><span class="authors">Torgerson DG, Singh RS</span> (2005) Rapid evolution through gene duplication and subfunctionalization of the testes-specific alpha4 proteasome subunits in Drosophila. Genetics 168: 1421–1432. <a class="find" href="/article/findArticle.action?author=Torgerson&title=Rapid evolution through gene duplication and subfunctionalization of the testes-specific alpha4 proteasome subunits in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b062" id="pgen-0030197-b062"></a><span class="authors">Birney E, Clamp M, Durbin R</span> (2004) GeneWise and Genomewise. Genome Res 14: 988–995. <a class="find" href="/article/findArticle.action?author=Birney&title=GeneWise and Genomewise."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b063" id="pgen-0030197-b063"></a><span class="authors">Hild M, Beckmann B, Haas SA, Koch B, Solovyev V, et al. </span> (2003) An integrated gene annotation and transcriptional profiling approach towards the full gene content of the <i>Drosophila</i> genome. Genome Biol 5: R3. <a class="find" href="/article/findArticle.action?author=Hild&title=An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b064" id="pgen-0030197-b064"></a><span class="authors">Domazet-Loso T, Tautz D</span> (2003) An evolutionary analysis of orphan genes in <i>Drosophila</i>. Genome Res 13: 2213–2219. <a class="find" href="/article/findArticle.action?author=Domazet-Loso&title=An evolutionary analysis of orphan genes in Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b065" id="pgen-0030197-b065"></a><span class="authors">Schmid KJ, Aquadro CF</span> (2001) The evolutionary analysis of “orphans” from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159: 589–598. <a class="find" href="/article/findArticle.action?author=Schmid&title=The evolutionary analysis of %E2%80%9Corphans%E2%80%9D from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b066" id="pgen-0030197-b066"></a><span class="authors">Schmid KJ, Tautz D</span> (1997) A screen for fast evolving genes from <i>Drosophila</i>. Proc Natl Acad Sci U S A 94: 9746–9750. <a class="find" href="/article/findArticle.action?author=Schmid&title=A screen for fast evolving genes from Drosophila."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b067" id="pgen-0030197-b067"></a><span class="authors">Long M, Betran E, Thornton K, Wang W</span> (2003) The origin of new genes: glimpses from the young and old. Nat Rev Genet 4: 865–875. <a class="find" href="/article/findArticle.action?author=Long&title=The origin of new genes: glimpses from the young and old."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b068" id="pgen-0030197-b068"></a><span class="authors">Richardson AO, Palmer JD</span> (2007) Horizontal gene transfer in plants. J Exp Bot 58: 1–9. <a class="find" href="/article/findArticle.action?author=Richardson&title=Horizontal gene transfer in plants."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b069" id="pgen-0030197-b069"></a><span class="authors">Bateman A, Coin L, Durbin R, Finn RD, Hollich V, et al. </span> (2004) The Pfam protein families database. Nucleic Acids Res 32: D138–D141. <a class="find" href="/article/findArticle.action?author=Bateman&title=The Pfam protein families database."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b070" id="pgen-0030197-b070"></a><span class="authors">Edgar RC</span> (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797. <a class="find" href="/article/findArticle.action?author=Edgar&title=MUSCLE: multiple sequence alignment with high accuracy and high throughput."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b071" id="pgen-0030197-b071"></a><span class="authors">Felsenstein J</span> (1989) PHYLIP – Phylogeny Inference Package (Version 3.2). Cladistics 5: 164–166. <a class="find" href="/article/findArticle.action?author=Felsenstein&title=PHYLIP %E2%80%93 Phylogeny Inference Package (Version 3.2)."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b072" id="pgen-0030197-b072"></a><span class="authors">Yang Z</span> (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13: 555–556. <a class="find" href="/article/findArticle.action?author=Yang&title=PAML: a program package for phylogenetic analysis by maximum likelihood."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b073" id="pgen-0030197-b073"></a><span class="authors">Wong WSW, Yang Z, Goldman N, Nielsen R</span> (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168: 1041–1051. <a class="find" href="/article/findArticle.action?author=Wong&title=Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites."> Find this article online </a></li><li xpathLocation="noSelect"><a name="pgen-0030197-b074" id="pgen-0030197-b074"></a><span class="authors">Beissbarth T, Speed TP</span> (2004) GOstat: find statistically over-represented Gene Ontologies within a group of genes. Bioinformatics 20: 1464–1465. <a class="find" href="/article/findArticle.action?author=Beissbarth&title=GOstat: find statistically over-represented Gene Ontologies within a group of genes."> Find this article online </a></li></ol></div> </div> </div> <div style="display:none"> <div dojoType="ambra.widget.RegionalDialog" id="AnnotationDialog" style="padding:0;margin:0;"> <div class="dialog annotate"> <div class="tipu" id="dTipu"></div> <div class="comment"> <h5><span class="commentPrivate">Add Your Note (For Private Viewing)</span><span class="commentPublic">Post Your Note (For Public Viewing)</span></h5> <div class="posting pane"> <form name="createAnnotation" id="createAnnotation" method="post" action=""> <input type="hidden" name="target" value="info:doi/10.1371/journal.pgen.0030197" /> <input type="hidden" name="startPath" value="" /> <input type="hidden" name="startOffset" value="" /> <input type="hidden" name="endPath" value="" /> <input type="hidden" name="endOffset" value="" /> <input type="hidden" name="commentTitle" id="commentTitle" value="" /> <input type="hidden" name="comment" id="commentArea" value="" /> <input type="hidden" name="ciStatement" id="statementArea" value="" /> <input type="hidden" name="isCompetingInterest" id="isCompetingInterest" value="false" /> <input type="hidden" name="noteType" id="noteType" value="" /> <fieldset> <legend>Compose Your Note</legend> <span id="submitMsg" class="error" style="display:none;"></span> <table class="layout"> <tr> <td> <label for="cNoteType">This is a </label><select name="cNoteType" id="cNoteType"><option value="note">note</option><option value="correction">correction</option></select> <span id="cdls" style="visibility:hidden;margin-left:0.3em; white-space:nowrap;"><a href="/static/commentGuidelines.action?target=info%3Adoi%2F10.1371%2Fjournal.pgen.0030197#corrections">What are corrections?</a></span> <label for="cTitle" class="commentPublic"><span class="none">Enter your note title</span><!-- error message text <em>A title is required for all public notes</em>--></label> <input type="text" name="cTitle" id="cTitle" value="Enter your note title..." class="title commentPublic" alt="Enter your note title..." /> <label for="cArea"><span class="none">Enter your note</span><!-- error message text <em>Please enter your note</em>--></label> <textarea name="cArea" id="cArea" value="Enter your note..." alt="Enter your note...">Enter your note...</textarea> <input type="hidden" name="isPublic" value="true" /> </td> <td> </td> <td class="coi"> <fieldset> <legend>Declare any competing interests.</legend> <ul> <li><label><input id="isCompetingInterestNo" type="radio" checked="checked" name="competingInterest" value="false" /> No, I don't have any competing interests to declare.</label></li> <li><label><input id="isCompetingInterestYes" type="radio" name="competingInterest" value="true" /> Yes, I have competing interests to declare (enter below):</label></li> </ul> <textarea name="ciStatementArea" id="ciStatementArea" disabled value="Enter your competing interests..." alt="Enter your competing interests...">Enter your competing interests...</textarea> </fieldset> </td> </tr> <tr> <td colspan="3" class="buttons"> <input type="button" value="Cancel" title="Click to close and cancel" id="btn_cancel"/> <input type="button" value="Submit" title="Click to post your note publicly" id="btn_post" class="primary"/> </td> </tr> </table> </fieldset> </form> </div> </div> <div class="tip" id="dTip"></div> </div> </div><div dojoType="ambra.widget.ContextAction" id="ContextActionDialog" class="contextActionDialog"> <div class="dialog context"> <div class="tipu" id="caTipu"></div> <div class="contextActionContent"> <h5><img src="/images/tooltip_addannotation.gif" /> Add a note to this text.</h5> Please follow our <a href="/static/commentGuidelines.action">guidelines for notes and comments</a> and review our <a href="/static/competing.action">competing interests policy</a>. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided: <ul> <li>Remarks that could be interpreted as allegations of misconduct</li> <li>Unsupported assertions or statements</li> <li>Inflammatory or insulting language</li> </ul> <form name="contextActionForm" id="contextActionForm" class="clearfix buttons" method="post" action=""> <input type="button" name="Continue" value="Continue" id="ContextActionDialogContinueButton" onmouseup="ambra.displayAnnotationContext.startComment(event);" title="Add a note to this text" class="primary"/> <input type="button" name="Cancel" value="Cancel" id="ContextActionDialogCancelButton" onclick="return false;" onmouseup="ambra.displayAnnotationContext.cancelContext(event);" title="Close this Window"/> </form> </div> <div class="tip" id="caTip"></div> </div> </div> <div dojoType="ambra.widget.ContextAction" id="ContextActionDialogNotLogged" class="contextActionDialog"> <div class="dialog context"> <div class="tipu" id="canlTipu"></div> <div class="contextActionContent"> <h5><img src="/images/tooltip_addannotation.gif" /> Add a note to this text.</h5> You must be logged in to add a note to an article. You may log in by <a onmousedown="ambra.displayAnnotationContext.disconnect(event);" href="/user/secure/secureRedirect.action?goTo=%2Farticle%2Finfo%3Adoi%2F10.1371%2Fjournal.pgen.0030197">clicking here</a> or <a href="#" onclick="return false;" onmouseup="ambra.displayAnnotationContext.cancelContext(event);">cancel this note</a>. </div> <div class="tip" id="canlTip"></div> </div> </div> <div dojoType="ambra.widget.ContextAction" id="ContextActionDialogBadSelection" class="contextActionDialog"> <div class="dialog context"> <div class="tipu" id="canBDTipu"></div> <div class="contextActionContent"> <h5 class="annotation icon"><img src="/images/tooltip_addannotation.gif" /> Add a note to this text.</h5> You cannot annotate this area of the document. <a href="#" onclick="return false;" onmouseup="ambra.displayAnnotationContext.cancelContext(event);">Close</a> </div> <div class="tip" id="canBDTip"></div> </div> </div> <div dojoType="ambra.widget.ContextAction" id="ContextActionDialogBadRangeSelection" class="contextActionDialog"> <div class="dialog context"> <div class="tipu" id="canbrTipu"></div> <div class="contextActionContent"> <h5><img src="/images/tooltip_addannotation.gif" /> Add a note to this text.</h5> You cannot create an annotation that spans different sections of the document; please adjust your selection.<br/> <a href="#" onclick="return false;" onmouseup="ambra.displayAnnotationContext.cancelContext(event);">Close</a> </div> <div class="tip" id="canbrTip"></div> </div> </div> <div dojoType="ambra.widget.RegionalDialog" id="CommentDialog" style="padding:0;margin:0;"> <div class="dialog preview"> <div class="tipu" id="cTipu"></div> <div class="btn close" id="btn_close" title="Click to close"><a title="Click to close">Close</a></div> <div id="cmtContainer" class="comment"> <h6 id="viewCmtTitle"></h6> <div class="detail" id="viewCmtDetail"></div> <div class="contentwrap" id="viewComment"></div> <div class="contentwrap" id="viewCIStatement"></div> <div class="detail" id="viewLink"> <!--<a href="#" class="commentary icon" title="Click to view full thread and respond">View all responses</a> <a href="#" class="respond tooltip" title="Click to respond to this posting">Respond to this</a>--> </div> </div> <div class="tip" id="cTip"></div> </div> </div> <div dojoType="ambra.widget.RegionalDialog" id="CommentDialogMultiple" style="padding:0;margin:0;"> <div class="dialog multiple preview"> <div class="tipu" id="mTipu"></div> <div class="btn close" id="btn_close_multi" title="Click to close"><a title="Click to close">Close</a></div> <ol id="multilist"></ol> <br/> <div id="multidetail"></div> <div class="tip" id="mTip"></div> </div> </div> <div dojoType="dijit.Dialog" id="Rating"> <div class="dialog annotate"> <div class="tipu" id="dTipu"></div> <div class="comment"> <h5><span class="commentPublic">Rate This Article</span></h5> <div class="instructions">Please follow our <a href="/static/ratingGuidelines.action">guidelines for rating</a> and review our <a href="/static/competing.action">competing interests policy</a>. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided: <ol> <li>Remarks that could be interpreted as allegations of misconduct</li> <li>Unsupported assertions or statements</li> <li>Inflammatory or insulting language</li> </ol> </div> <div class="posting pane"> <form name="ratingForm" id="ratingForm" method="post" action=""> <input type="hidden" name="articleURI" value="info:doi/10.1371/journal.pgen.0030197" /> <input type="hidden" name="commentTitle" id="commentTitle" value="" /> <input type="hidden" name="comment" id="commentArea" value="" /> <input type="hidden" name="ciStatement" id="statementArea" value="" /> <input type="hidden" name="isCompetingInterest" id="isCompetingInterest" value="" /> <fieldset> <legend>Compose Your Annotation</legend> <span id="submitRatingMsg" class="error" style="display:none;"></span> <table class="layout"> <tr> <td rowspan="2"> <label for="insight">Insight</label> <ul class="star-rating rating edit" title="Rate insight" id="rateInsight"> <li class="current-rating pct0"></li> <li><a href="javascript:void(0);" title="Bland" class="one-star" onclick="ambra.rating.setRatingCategory(this, 'insight', 1);">1</a></li> <li><a href="javascript:void(0);" title="" class="two-stars" onclick="ambra.rating.setRatingCategory(this, 'insight', 2);">2</a></li> <li><a href="javascript:void(0);" title="" class="three-stars" onclick="ambra.rating.setRatingCategory(this, 'insight', 3);">3</a></li> <li><a href="javascript:void(0);" title="" class="four-stars" onclick="ambra.rating.setRatingCategory(this, 'insight', 4);">4</a></li> <li><a href="javascript:void(0);" title="Profound" class="five-stars" onclick="ambra.rating.setRatingCategory(this, 'insight', 5);">5</a></li> </ul> <input type="hidden" name="insight" title="insight" value="" /> <label for="reliability">Reliability</label> <ul class="star-rating rating edit" title="Rate reliability" id="rateReliability"> <li class="current-rating pct0"></li> <li><a href="javascript:void(0);" title="Tenuous" class="one-star" onclick="ambra.rating.setRatingCategory(this, 'reliability', 1);">1</a></li> <li><a href="javascript:void(0);" title="" class="two-stars" onclick="ambra.rating.setRatingCategory(this, 'reliability', 2);">2</a></li> <li><a href="javascript:void(0);" title="" class="three-stars" onclick="ambra.rating.setRatingCategory(this, 'reliability', 3);">3</a></li> <li><a href="javascript:void(0);" title="" class="four-stars" onclick="ambra.rating.setRatingCategory(this, 'reliability', 4);">4</a></li> <li><a href="javascript:void(0);" title="Unassailable" class="five-stars" onclick="ambra.rating.setRatingCategory(this, 'reliability', 5);">5</a></li> </ul> <input type="hidden" name="reliability" title="reliability" value="" /> <label for="style">Style</label> <ul class="star-rating rating edit" title="Rate style" id="rateStyle"> <li class="current-rating pct0"></li> <li><a href="javascript:void(0);" title="Crude" class="one-star" onclick="ambra.rating.setRatingCategory(this, 'style', 1);">1</a></li> <li><a href="javascript:void(0);" title="" class="two-stars" onclick="ambra.rating.setRatingCategory(this, 'style', 2);">2</a></li> <li><a href="javascript:void(0);" title="" class="three-stars" onclick="ambra.rating.setRatingCategory(this, 'style', 3);">3</a></li> <li><a href="javascript:void(0);" title="" class="four-stars" onclick="ambra.rating.setRatingCategory(this, 'style', 4);">4</a></li> <li><a href="javascript:void(0);" title="Elegant" class="five-stars" onclick="ambra.rating.setRatingCategory(this, 'style', 5);">5</a></li> </ul> <input type="hidden" name="style" title="style" value="" /> <label for="cTitle" class="commentPublic"><span class="none">Enter your comment title</span><!-- error message text <em>A title is required for all public annotations</em>--></label> <input type="text" name="cTitle" id="cTitle" value="Enter your comment title..." class="title commentPublic" alt="Enter your comment title..." /> <label for="cArea"><span class="none">Enter your comment</span><!-- error message text <em>Please enter your annotation</em>--></label> <textarea name="cArea" id="cArea" value="Enter your comment..." alt="Enter your comment...">Enter your comment...</textarea> </td> <td rowspan="2"> </td> <td class="coi"> <fieldset> <legend>Declare any competing interests.</legend> <ul> <li><label><input id="isCompetingInterestNo" type="radio" name="competingInterest" value="false" /> No, I don't have any competing interests to declare.</label></li> <li><label><input id="isCompetingInterestYes" type="radio" name="competingInterest" value="true" /> Yes, I have competing interests to declare (enter below):</label></li> </ul> <textarea name="ciStatementArea" id="ciStatementArea" disabled value="Enter your competing interests..." title="Enter your competing interests...">Enter your competing interests...</textarea> </fieldset> </td> </tr> <tr> <td class="buttons"> <input type="button" value="Cancel" title="Click to close and cancel" id="btn_cancel_rating"/> <input type="button" value="Submit" title="Click to post your annotation publicly" id="btn_post_rating" class="primary"/> </td> </tr> </table> </fieldset> </form> </div> </div> </div> </div> <div dojoType="ambra.widget.LoadingCycle" id="LoadingCycle" class="loadingCycler"> <img src="/images/loading.gif" width="58" height="58" title="Loading..." /> </div> </div> </div> <!-- end : main contents --> </div> <!-- end : container --> <!-- begin : footer --> <div id="ftr"> <p><span>All site content, except where otherwise noted, is licensed under a <a href="http://creativecommons.org/licenses/by/2.5/" title="Creative Commons Attribution License 2.5" tabindex="200">Creative Commons Attribution License</a>.</span></p> <ul> <li><a href="/static/privacy.action" title="PLoS Privacy Statement" tabindex="501">Privacy Statement</a></li> <li><a href="/static/terms.action" title="PLoS Terms of Use" tabindex="502">Terms of Use</a></li> <li><a href="http://www.plos.org/advertise/" title="Advertise With PLoS" tabindex="503">Advertise</a></li> <li><a href="http://www.plos.org/journals/embargopolicy.html" title="PLoS Embargo Policy" tabindex="504">Media Inquiries</a></li> <li><a href="http://www.plos.org/journals/print.html" title="PLoS in Print" tabindex="505">PLoS in Print</a></li> <li><a href="/static/sitemap.action" title="Site Map" tabindex="506">Site Map</a></li> <li><a href="http://www.plos.org" title="PLoS.org" tabindex="507">PLoS.org</a></li> </ul> <div class="powered"> <ul> <li><a href="/static/releaseNotes.action" title="Ambra | Release Notes">Ambra 0.9.4 beta</a></li> <li>Managed Colocation provided by <a href="http://www.unitedlayer.com/" title="UnitedLayer: Built on IP Services">UnitedLayer</a>.</li> </ul> </div> </div> <!-- end : footer --> <script type="text/javascript"> var _namespace=""; var loggedIn = false; var almHost = "http://alm.plos.org"; // Safari v3.1.1 "console.debug" issue (http://trac.dojotoolkit.org/ticket/6849) workaround if (/3[\.0-9]+ Safari/.test(navigator.appVersion)) { window.console = { origConsole: window.console, log: function(s){ this.origConsole.log(s); }, info: function(s){ this.origConsole.info(s); }, error: function(s){ this.origConsole.error(s); }, warn: function(s){ this.origConsole.warn(s); } }; } var djConfig = { // don't debug for IE - as dojo's firebug lite module is error prone in IE isDebug: false, parseOnLoad: true }; </script> <script type="text/javascript" src="/javascript/dojo/dojo/dojo.js"></script> <script type="text/javascript" src="/javascript/dojo/dojo/ambra.js"></script> <script type="text/javascript" src="/javascript/init_global.js"></script> <script type="text/javascript" src="/javascript/init_article.js"></script> <script type="text/javascript" src="/javascript/init_ratings.js"></script> <script type="text/javascript" src="/javascript/init_article_body.js"></script> <script type="text/javascript" src="/javascript/init_article_rhc.js"></script> <script type="text/javascript" src="/javascript/alm.js"></script> <script type="text/javascript" src="/javascript/reporting/articleViewsCumulative.js"></script> <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> var pageTracker = _gat._getTracker("UA-338393-1"); pageTracker._trackPageview(); pageTracker._setDomainName("www.plosgenetics.org"); </script> </body> </html>