I said yesterday I would say a bit more about the new paper on rapid recent high altitude adaptation among the Tibetans when I’d read the paper. Well, I’ve read it now. Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude:
Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
The exome is just the protein-coding part of the genome; so they’re focusing ostensibly on functionally relevant single nucleotide polymorphisms (SNPs). About a month and a half ago a similar paper on Tibetan high altitude adaptations was published in Science (I posted on that too), but their methodology was somewhat different. That group was looking at a set of genes, candidates, which they’d assume might have been under selection and so have functional significance in explaining Tibetan vs. non-Tibetan phenotypes at high altitudes. This second paper takes a more bottom up approach, scanning the genome of Tibetans and Han Chinese, and trying to spotlight regions which exhibit a great deal of between population variance, far greater than one might presume from the total genome genetic distances.
As to that last point…the timing of this has been causing a major problem with archaeologists. The supplements lays out the details a bit more than the press reports, so below is figure 2:
It looks like to get a better sense of the model you’ll have to read the cited paper, and I’m not sure that that will satisfy the archaeologists. They did use a large number of neutral markers though, so I’m not too worried about biases in their data set. Some have been confused about the population numbers, but this value in a population genetic context can be counterintuitive, especially over the long term (low values are given much more weight than high values). The small Han value can be easily made less confusing when you consider a massive demographic expansion from a small founder group, as well as persist long term biases in reproductive value within the population (e.g., some males in a given generation are way more fecund than others through polygyny). A higher N for Tibetans may be explained by a more stable population where diverse subsets and across individuals the reproductive value may be more equitable. In other words, an effective population size is a statistic which is bundling together a lot of evolutionary history, and is not a simple measure of perceived census sizes (the Tibetans may also be something of a melange of a diverse set of ancient groups which took refuge in the highlands, while the Han are the descendants of early adopters of agriculture which expanded demographically; so they’re opposite ends of the demographic tunnel).
The time of divergence of a little under 3,000 years is important for the rest of the paper, so I suppose other workers had better replicate their findings in the future. Figure 1 is rather striking, so let’s jump to it:
This chart is simply showing frequencies of SNPs in Tibetans and Han. The two are obviously correlated, as evident by the diagonal. Shading indicates the density of the number of SNPs at a given position. Look to the bottom right, and you see the gene around which much of the paper hinges, EPAS1. It’s an enormous outlier, with SNPs where Tibetans and Han differ a great deal. This is important in regards to looking for genes which may drive adaptation to higher altitudes; if you don’t have different genes then you don’t have different traits. If the Tibetans and Han diverged ~3,000 years ago, then those adaptations may be recent and would have emerged through rapid allele frequency changes (though they observe that it may be drawn from standing variation). The researchers didn’t go looking for EPAS1 as such, rather, it came looking for them. What does it do? From the text:
EPAS1 is also known as hypoxia-inducible factor 2{alpha} (HIF-2{alpha}). The HIF family of transcription factors consist of two subunits, with three alternate {alpha} subunits (HIF-1{alpha}, HIF-2{alpha}/EPAS1, HIF-3{alpha}) that dimerize with a β subunit encoded by ARNT or ARNT2. HIF-1{alpha} and EPAS1 each act on a unique set of regulatory targets…and the narrower expression profile of EPAS1 includes adult and fetal lung, placenta, and vascular endothelial cells…A protein-stabilizing mutation in EPAS1 is associated with erythrocytosis…suggesting a link between EPAS1 and the regulation of red blood cell production.
Next, they dig into the functional significant of EPAS1 variants, in the literature, and in their current sample:
Associations between SNPs at EPAS1 and athletic performance have been demonstrated…Our data set contains a different set of SNPs, and we conducted association testing on the SNP with the most extreme frequency difference, located just upstream of the sixth exon. Alleles at this SNP tested for association with blood-related phenotypes showed no relationship with oxygen saturation. However, significant associations were discovered for erythrocyte count (F test P = 0.00141) and for hemoglobin concentration (F test P = 0.00131), with significant or marginally significant P values for both traits when each village was tested separately (table S5). Comparison of the EPAS1 SNP to genotype data from 48 unlinked SNPs confirmed that its P value is a strong outlier (5) (fig. S4).
The allele at high frequency in the Tibetan sample was associated with lower erythrocyte quantities and correspondingly lower hemoglobin levels…Because elevated erythrocyte production is a common response to hypoxic stress, it may be that carriers of the “Tibetan” allele of EPAS1 are able to maintain sufficient oxygenation of tissues at high altitude without the need for increased erythrocyte levels. Thus, the hematological differences observed here may not represent the phenotypic target of selection and could instead reflect a side effect of EPAS1-mediated adaptation to hypoxic conditions. Although the precise physiological mechanism remains to be discovered, our results suggest that the allele targeted by selection is likely to confer a functionally relevant adaptation to the hypoxic environment of high altitude.
There are random anomalies in nature, but it seems too perfect that this is the outlier in allele frequencies across two populations which differ in adaptations which relate to many of the traits above.
OK, so they found an outlier SNP. The gene seems to have a reasonable probability of being involved in functional pathways relevant to altitude adaptation. But so far we’ve been focusing on the Tibetan-Han difference. If the two populations separated about 3,000 years ago one assumes that genes with SNPs with huge Fsts, where most of the variation can be partitioned between the groups, not within them, are good candidates for having been driven by selection. But it would be nice to compare with an outgroup. So they compared the Tibetans and Hans with the Danes, who are an outgroup who separated from the East Asian cluster about one order of magnitude further back in time (~30,000 years). Next they generated a “population branch statistic,” (PBS), from the the Fst data (see the supplements). Basically you’re getting a value which describes allele frequency differences normalized to the expected genetic distance as known from population history. I’ve extracted out Panel B from figure 2. T = Tibetans, H = Han, and D = Danes. The smaller tree represents genome average PBS values. It’s what you’d expect, the Danes are the outgroup. Over time genetic difference builds up because of separation between the groups. The Han and Tibetans are very close, as you’d expect from genetically similar populations. But look at the larger tree, the Tibetans are the outgroup by a mile! The Danes and Han differ far less from each other on EPAS1 than they do from the Tibetans. This seems like a clear deviation from the level of allele frequency difference one might be able to generate by neutral random walk processes.
EPAS1 isn’t the only gene which they found, but it was the most significant, and illustrates the nature of the methodological orientation of this group. Sift through the genome and look for something which is totally unexpected, and put a focus on the peculiar diamond in the rough and see what it can tell you. They conclude with the big picture:
Of the genes identified here, only EGLN1 was mentioned in a recent SNP variation study in Andean highlanders (24). This result is consistent with the physiological differences observed between Tibetan and Andean populations…suggesting that these populations have taken largely distinct evolutionary paths in altitude adaptation.
Several loci previously studied in Himalayan populations showed no signs of selection in our data set…whereas EPAS1 has not been a focus of previous altitude research. Although EPAS1 may play an important role in the oxygen regulation pathway, this gene was identified on the basis of a noncandidate population genomic survey for natural selection, illustrating the utility of evolutionary inference in revealing functionally important loci.
Given our estimate that Han and Tibetans diverged 2750 years ago and experienced subsequent migration, it appears that our focal SNP at EPAS1 may have experienced a faster rate of frequency change than even the lactase persistence allele in northern Europe, which rose in frequency over the course of about 7500 years…EPAS1 may therefore represent the strongest instance of natural selection documented in a human population, and variation at this gene appears to have had important consequences for human survival and/or reproduction in the Tibetan region.
Natural selection is somewhat stochastic; it can take different tacks to the same process because it doesn’t have infinite power in its search algorithm. Given enough time and gene flow no doubt adaptations would homogenize and converge upon a perfect optimum, but given enough time the universe will devolve into heat death. Evolution has to operate extemporaneously for eternity because the conditions are ever changing. Second, the big headline grabbing assertion about EPAS1 being the strongest instance of natural selection needs to be moduled by the fact that the conclusion was generated assuming the validity of the inferences of a particular model, and models can be wrong. It does seem like the evolutionary change is likely to be recent, I doubt they’d be off by an order of magnitude. But for lactase persistence we’ve extracted genetic material from ancient remains. The conclusion then is much more concrete in this case. Until we get remains from ancient Tibetans and can infer their allele frequencies, there will be some asymmetry in the confidence with which we can make a claim as to when the selection event began.
Citation: Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z., Pool, J., Xu, X., Jiang, H., Vinckenbosch, N., Korneliussen, T., Zheng, H., Liu, T., He, W., Li, K., Luo, R., Nie, X., Wu, H., Zhao, M., Cao, H., Zou, J., Shan, Y., Li, S., Yang, Q., Asan, ., Ni, P., Tian, G., Xu, J., Liu, X., Jiang, T., Wu, R., Zhou, G., Tang, M., Qin, J., Wang, T., Feng, S., Li, G., Huasang, ., Luosang, J., Wang, W., Chen, F., Wang, Y., Zheng, X., Li, Z., Bianba, Z., Yang, G., Wang, X., Tang, S., Gao, G., Chen, Y., Luo, Z., Gusang, L., Cao, Z., Zhang, Q., Ouyang, W., Ren, X., Liang, H., Zheng, H., Huang, Y., Li, J., Bolund, L., Kristiansen, K., Li, Y., Zhang, Y., Zhang, X., Li, R., Li, S., Yang, H., Nielsen, R., Wang, J., & Wang, J. (2010). Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude Science, 329 (5987), 75-78 DOI: 10.1126/science.1190371