Over the past decade evolutionary geneticist Mike Lynch has been articulating a model of genome complexity which relies on stochastic factors as the primary motive force by which genome size increases. The argument is articulated in a 2003 paper, and further elaborated in his book The Origins of Genome Architecture. There are several moving parts in the thesis, some of which require a rather fine-grained understanding of the biophysical structural complexity of the genome, the nature of Mendelian inheritance as a process, and finally, population genetics. But the core of the model is simple: there is an inverse relationship between long term effective population size and genome complexity. Low individual numbers ~ large values in terms of base pairs and counts of genetic elements such as introns.
A quick reminder: effective population size denotes the proportion of the population which contributes genes to the next generation. So, in the case of insects with extremely high mortality in the larval stage the effective population size may be orders of magnitude smaller than the census size at any given generation evaluating over all stages of life history. In contrast, with humans a much larger proportion of children end up contributing to the genetic makeup of the subsequent generation. With large organisms I’ve heard you can sometimes use a rule of thumb that effective population size is ~1/3 of census size, though this probably overestimates the effective population size. One reason that reproductive variation reduces the effective population, because many individuals contribute far less to the next generation than other individuals. The greater the variance, the more evolutionary genetic variation is impacted by a few individuals within the population at a given generation, reducing effective population which contributes to the next (the reproductive variance is often assumed to be poisson, but that is likely an underestimate). Additionally, there is the issue of variation over time. Long term effective population is much more sensitive to low bound values than high bound values, so it is liable to be much smaller than the census size at any given period for a species which goes through cycles. Humans for example have a relatively small long term effective population size evaluated over the past 100,000 years because we seem to have expanded from a small initial population. Mathematically since long term effective population size is given by the harmonic mean it stands to reason that low bound values would be critical. If that doesn’t make sense to you, remember the outsized impact which population bottlenecks may have on the long term trajectory of a species, in particular by removing genetic variation.
How does this influence genome complexity? Basically Lynch’s thesis is that when you reduce effective population you dampen the power of natural selection, specifically purifying selection, from preventing the addition of non-adaptive complexity through random processes. It isn’t that selection is rendered moot, rather, its signal is overwhelmed by the noise. Here’s the abstract of his 2003 paper:
Complete genomic sequences from diverse phylogenetic lineages reveal notable increases in genome complexity from prokaryotes to multicellular eukaryotes. The changes include gradual increases in gene number, resulting from the retention of duplicate genes, and more abrupt increases in the abundance of spliceosomal introns and mobile genetic elements. We argue that many of these modifications emerged passively in response to the long-term population-size reductions that accompanied increases in organism size. According to this model, much of the restructuring of eukaryotic genomes was initiated by nonadaptive processes, and this in turn provided novel substrates for the secondary evolution of phenotypic complexity by natural selection. The enormous long-term effective population sizes of prokaryotes may impose a substantial barrier to the evolution of complex genomes and morphologies.
The implication here is that prokaryotes with massive population sizes are biased toward smaller genomes by the more efficacious application natural selection. In contrast, more complex organisms which have smaller population sizes, and so are more impacted by the random fluctuations generation to generation due to sample variance, are less streamlined genomically because selection can do only so much against the swelling sea of noise. One intriguing argument of Lynch is that the genomic complexity is then later useful downstream as the building block of phenotypic complexity, but let’s set that aside for now.
A new paper in PLoS Genetics challenges the statistical analysis of the original data which Lynch et al. used to make their case. Technically the argue was that there was an inverse relationship between Neu and genome size. Ne is effective population size, and u is nucleotide mutation rate. Though argument is technical, and the basic objection should be easy to understand: there are other variables which may actually be responsible for the correlation which Lynch et al. discerned. To the paper, Did Genetic Drift Drive Increases in Genome Complexity?:
Genome size (the amount of nuclear DNA) varies tremendously across organisms but is not necessarily correlated with organismal complexity. For example, genome sizes just within the grasses vary nearly 20-fold, but large-genomed grass species are not obviously more complex in terms of morphology or physiology than are the small-genomed species. Recent explanations for genome size variation have instead been dominated by the idea that population size determines genome size: mutations that increase genome size are expected to drift to fixation in species with small populations, but such mutations would be eliminated in species with large populations where natural selection operates at higher efficiency. However, inferences from previous analyses are limited because they fail to recognize that species share evolutionary histories and thus are not necessarily statistically independent. Our analysis takes a phylogenetic perspective and, contrary to previous studies, finds no evidence that genome size or any of its components (e.g., transposon number, intron number) are related to population size. We suggest that genome size evolution is unlikely to be neatly explained by a single factor such as population size.
In the original analysis by Lynch et al. ~66% of the variation in genome size was explained by Neu! That’s a pretty large effect. Figure 1 illustrates how phylogeny could be a confound in adducing a relationship. Here’s some of the text which explains the figure:
In this hypothetical example, eight species have been measured for two traits, x and y, as indicated by pairs of values at the tips of the phylogenetic tree (A). Ordinary least-squares linear regression (OLS) indicates a statistically significant positive relationship (B; r-squared = 0.62, P = 0.02), potentially leading to an inference of a positive evolutionary association between x and y. However, inspection of the scatterplot (B) in relation to the phylogenetic relationships of the species (A) indicates that the association between x and y is negative for the four species within each of the two major lineages. Regression through the origin with phylogenetically independent contrasts…which is equivalent to phylogenetic generalized least squares (PGLS) analysis, accounts for the nonindependence of species and indicates no overall evolutionary relationship between the traits…The apparent pattern across species was driven by positively correlated trait change only at the basal split of the phylogeny; throughout the rest of the phylogeny, the traits mostly changed in opposite directions (A; basal contrast in red)….
The argument then seems to be that the relationship in the original work by Lynch was an artifact due to the evolutionary history of the species which he surveyed to infer the relationship. Instead of a general principle or law then what you have is an outcome of contingent historical processes. Not very neat and clean. You can see the taxa-clustered nature of the relationship in figure 1 from the 2003 paper in Science:
OK, now let’s look at the visualization of the same data set from this paper, as a tree to illustrate the correlations:
The last figure shows the difference between a scatterplot using conventional OLS regression, and the phylogenetic least squares model (PGLS). You go from an obvious linear relationship, which translated into the high r-squared noted above, to basically nothing (r-squared near zero, no statistical significance).
The paper itself isn’t that long, the objection is pretty straightforward. They’re simply claiming that Lynch didn’t correct for an obvious alternative explanation/confound, and that we don’t know what we thought we knew. Additionally, there is the assertion that the idea that effective population size predicts genome size robustly is becoming conventional wisdom within the scientific community. I don’t know about that, this seems like such a young field in flux that I think they oversold how widespread this assumption is to make the force of their rebuttal more critical. Certainly the patterns in genome size can be quite perplexing, but my intuition is that an r-squared on the order of 2/3 of the variation in genome size being explained by one predictor variable is rather astounding. Obviously genome size is pretty easy to get in the “post-genomic era,” but Ne and u are harder to come by for many taxa, or even within a given taxon for a set of species of interest. It looks to me an opportunity for experimental evolutionalists, who can control the confounds, and observe changes within a lineage. And yet even if Neu is predictive as an independent variable all things controlled, what if all things are not usually controlled, and random acts of phylogenetic history are more important? Mike Lynch is credited in the acknowledgements, so I assume we’ll be seeing a response from him in the near future.
Citation: Whitney KD, & Garland T Jr (2010). Did Genetic Drift Drive Increases in Genome Complexity? PLoS Genetics : 10.1371/journal.pgen.1001080