President William Howard Taft
It is the best of times, it is the worse of times. On the one hand the medical consequences of human genomics have been underwhelming. This is important because this is the ultimate reason that much of the basic research is funded. And yet we’ve learned so much. The genetic architecture of skin color has been elucidated, and we’ve seen a clarification of patterns of natural selection in the human genome. The finding last spring of Neandertal admixture in modern human populations is perhaps the most awesome pure science finding of late, coming close to resolving a decades old debate in anthropology. This doesn’t cure cancer, but it does connect the dots about the human past, and that’s not trivial. We are species haunted by our memories, so we might as well get them right!
But all hope is not lost. Research continues. And one area which general surveys of genomic variation have usually shown to be targets of natural selection, and, also have clear and immediate biomedical relevance, is that of metabolism. How we eat, and how we process and integrate the food we eat, is of obvious fitness relevance in the evolutionary and medical senses. It turns out that there is even variation in our saliva which is probably due to natural selection. The combination of diversity in human cuisine and susceptibility to the diseases of modern life indicate possibilities as to the relationship between past selection pressures and contemporary patterns of genetic variation. Of course one has to tread softly in this area, there are the inevitable confounds of environment, as well the unfortunate probability of any given locus being of small effect size in its influence on any given trait.
A new paper in Genome Research reports a SNP which seems to have been subject to natural selection in Eurasians within the last 10,000 years. This variant is located within an exon on a gene, GIP, which produces peptides critical in the regulation of various metabolic pathways, in particular insulin response. A possible biomedical relevance to risk susceptibility is then explored subsequent to the evolutionary genomic preliminaries. Adaptive selection of an incretin gene in Eurasian populations:
Diversities in human physiology have been partially shaped by adaptation to natural environments and changing cultures. Recent genomic analyses have revealed single nucleotide polymorphisms (SNPs) that are associated with adaptations in immune responses, obvious changes in human body forms, or adaptations to extreme climates in select human populations. Here, we report that the human GIP locus was differentially selected among human populations based on the analysis of a nonsynonymous SNP (rs2291725). Comparative and functional analyses showed that the human GIP gene encodes a cryptic glucose-dependent insulinotropic polypeptide (GIP) isoform (GIP55S or GIP55G) that encompasses the SNP and is resistant to serum degradation relative to the known mature GIP peptide. Importantly, we found that GIP55G, which is encoded by the derived allele, exhibits a higher bioactivity compared with GIP55S, which is derived from the ancestral allele. Haplotype structure analysis suggests that the derived allele at rs2291725 arose to dominance in East Asians ∼8100 yr ago due to positive selection. The combined results suggested that rs2291725 represents a functional mutation and may contribute to the population genetics observation. Given that GIP signaling plays a critical role in homeostasis regulation at both the enteroinsular and enteroadipocyte axes, our study highlights the importance of understanding adaptations in energy-balance regulation in the face of the emerging diabetes and obesity epidemics.
This is a paper with several moving parts.
-There is genomics (the broad sweep of the genome)
-Genetics (a focus on a few genes and their consequences)
-Biochemistry
-And some allusion to epidemiology, as befits a paper which comes out of a medical department
The first observation is that rs2291725 differs a great deal across populations. As I said, it’s a SNP on an exon in GIP. Not only that, it’s nonsynonomous, which means that it’s in a position to change the structure and therefore function of the biochemical which the sequence is ultimately coding for. The T allele is the ancestral variant, while the C allele is the derived one. That means that C arose as a mutation against the background of T. There is a figure which shows the geographical distribution of the variance on this SNP from the HGDP data set in the paper, but I think the HGDP browser produces a crisper display, so here it is:
As you can see the ancestral allele is dominant in Africa. In several populations it is fixed. In contrast among non-African populations there’s quite a bit of variation. In East Asia the derived variant is at a high frequency, though not fixed. In West Eurasia and North Africa the two variants are at rough balance, more or less. Finally, in the New World the derived variant is found in appreciable proportions, but the ancestral variant of the SNP is found at much higher proportions than in other non-African populations. Seeing as how Amerindians derive from a branch of East Eurasians, common descent from an ancestor with the derived allele can not explain the frequency discrepancy. Interestingly the HGDP Melanesians have amongst the highest frequencies of the derived allele in the data set.
In any case, most of the analysis was not done with the HGDP sample, but with the first two phases of the HapMap. The marker density is richer in this sample, and obviously it is easier to compare a few populations than dozens. So the primary populations of comparison in this study were the Chinese + Japanese (ASN), Utah Whites (CEU), and Yoruba from Nigeria (YRI). It was immediately noticeable that when doing pairwise comparisons between two populations in the HapMap data set that the SNP of interest in GIP was exceptional in between population difference when set against other nonsynonymous SNPs. The chart below shows the SNP in red, with the full distribution curve of Fst (proportion of between population difference) illustrated by the bars in blue. rs2291725 is the top 0.5% of Fst difference between ASN and YRI.
The expected Fst between continental races is on the order of ~0.15. The ASN vs. YRI difference is far greater than that, and even more exceptional when you note the skew of the distribution. As it happens there’s HapMap3 data on this SNP as well. It doesn’t add much value to the HGDP, but does confirm the general findings:
Population descriptors:
ASW (A): African ancestry in Southwest USA
CEU (C): Utah residents with Northern and Western European ancestry from the CEPH collection
CHB (H): Han Chinese in Beijing, China
CHD (D): Chinese in Metropolitan Denver, Colorado
GIH (G): Gujarati Indians in Houston, Texas
JPT (J): Japanese in Tokyo, Japan
LWK (L): Luhya in Webuye, Kenya
MEX (M): Mexican ancestry in Los Angeles, California
MKK (K): Maasai in Kinyawa, Kenya
TSI (T): Tuscan in Italy
YRI (Y): Yoruban in Ibadan, Nigeria
Now that they’ve established between population variation at the SNP, what about the structure around the SNP? Remember, the SNP is one base pair. T in the ancestral state, C in the derived. The patterns of variation flanking the SNP in GIP can tell us a lot. What they found was this:
– Africans have several different haplotypes around the T allele. A haplotype is just a set of correlated markers
– The C allele in East Asians seem to be embedded within one haplotype, or set of markers
– There was a lot of linkage disequilibrium around the C allele in East Asians
In East Asians both EHH and iHS were consistent with, if not necessarily suggestive of, selection. A plausible scenario is that the C allele was subject to a powerful bout of natural selection recently, and the allele rose so rapidly in frequency that a selective sweep dragged along the flanking regions of the genome. This would homogenize the variance in that genic region within the population in question (East Asians), as the numerous other haplotypes would decline in proportion. To show the relationships of the various haplotypes within the three HapMap populations being analyzed here they produced an unrooted tree. Observe that the haplotype in which the derived variant is embedded has only Asians and Europeans, and is on a separate branch by itself:
I noted above that just because there is a lot of linkage disequilibrium and haplotype block structure in this region of the genome, it doesn’t necessarily mean that it was a target of natural selection. There may have been stochastic phenomenon which produced these results, and so our inference would be a false positive. To check for this they ran several models and simulations which varied demographic parameters under neutral (non-selective) conditions, and for the Asian sample the iHS scores were generally not as low as those for the SNP of interest. This does not “prove” that demography can not explain these results, but it does shift the probability more toward natural selection than before.
The circumstantial evidence presented above is that the derived allele rose to frequency relatively recently (in general LD decays rapidly over time, so these tests detect more recent selective or demographic events). They ran a simulation under neutral parameters, and for the frequency of the derived haplotype it would take 100-500,000 years for the various populations to reach the values which we see (starting from the initial mutant gene copy). The latter figure is outside the bounds of modern humanity, while the former probably pre-dates the ”Out of Africa” event. It is implausible that so much haplotype structure could be preserved over time, because recombination over the generations breaks apart associations between markers. Using the recombination rates, which would slowly degrade long haplotypes in the genome, the authors inferred that the C allele and its haplotype began to rise in frequency on the order of 12-2,000 years before the present.
Why would an allele rise to frequency within the past 10,000 years? The authors gave the game away in the abstract: humans shifted to different modes of primary production after the rise of agriculture. This is where the role of GIP in producing peptides which have a role in regulating our biochemistry is relevant. GIP is of a class of hormones found in the intestine called incretins:
Incretins are a group of gastrointestinal hormones that cause an increase in the amount of insulin released from the beta cells of the islets of Langerhans after eating, even before blood glucose levels become elevated. They also slow the rate of absorption of nutrients into the blood stream by reducing gastric emptying and may directly reduce food intake. As expected, they also inhibit glucagon release from the alpha cells of the Islets of Langerhans….
Increased insulin reduces blood sugar. Diabetes is a malfunction of the insulin release mechanism, and so blood sugar begins to rise as individuals don’t uptake their glucose. Glucagon has the opposite effect, increasing blood sugar. But just because there is a change in a nonsynonymous position in an exonic region of a gene of relevance to the pathway, it doesn’t mean that that necessarily impacts the pathway which is illustrated to the left. And for natural selection to have any traction it needs to have an impact on some sort of concrete biological process (unless we’re talking intra-genomic competition of some sort).
It turns out that rs2291725 is actually just outside the primary coding region for the GIP peptide. For it to be a functional variant there needs to be more to the story. As it turns out, there are other less common variants which ware modified by changes at this SNP, GIP55S and GIP55G. The first is produced by the ancestral T allele, and the second by the derived C allele. GIP55S and GIP55G are also found in the intestine, though they only constitute a few percent of the total GIP.
But here’s where it gets really interesting: GIP55G exhibits more bioactivity over the long term. In other words it seems to be more potent the generic GIP or GIP55S, the ancestral variant. They’ve gone from supposition based on the functional significance of the broader gene, to a connection between the T→C transition over the last 10,000 years. As it turns out it may be that those with GIP55G would have a stronger insulin response, and so reduce blood sugar faster, than those without.
It doesn’t take a genius to figure out where there’re going with this. The relationship between insulin response and carbohydrates in our day and age is fraught. But we already suspect that carbs have reshaped the human genome through copy number variation in the amylase gene. It is interesting though that the derived variant has not fixed. That is, it hasn’t replaced the ancestral variant. This may be due to dominance, so that one copy is almost as efficacious as two, or, it may be due to balancing selection of some sort, which the authors suggest in the text. At this point it’s time to jump to the discussion and let the authors speak for themselves. They start out well:
Based on the gene age estimation and biochemical analyses, our study revealed a functional mutation that is associated with the selection of the GIP locus in East Asian populations ~8100 yr ago and the presence of a cryptic GIP isoform. Specifically, we showed that the inventory of human GIP peptides has recently diverged and that individuals could express three different combinations of GIP isoforms (GIP, GIP55S, and GIP55G) with distinct bioactivity profiles. Future study of how this phenotypic variation affects glucose and lipid homeostasis in response to different diets and of which physiological variations in humans can be attributed to prior gene–environmental interactions at the GIP locus is crucial to a better understanding of human adaptations in energy-balance regulation.
As I observed above many of the researchers have a biomedical background, and the NIH is funding this. The evolutionary anthropological findings, cautious as they are, are fascinating and of deep interest. But I don’t think this is going to go anywhere:
It was hypothesized by Neel almost 50 yr ago that mismatches between prior physiological adaptations and contemporary environments can lead to health risks because the ancestral variants that have been selected for the organism’s fitness or reproductive success may not be optimal for the individual’s health in the new environment…In support of this thrifty genotype hypothesis, a number of genes in humans and house mice have been implied to have coevolved with the emergence of agricultural societies…and a rapid shift in diets is associated with the detrimental effects on human survival in a number of human populations…Conceptually, the serum-resistant GIP55G carried by the GIP103C haplotype may have been beneficial for individuals who have unconstrained access to the food supply in many agricultural societies by preventing severe hyperglycemia. As selection pressure changed in these societies, the ancient GIP103T haplotype could have become a liability and conferred a loss of fitness in the new environment. In addition, we speculate that the selection of GIP in East Asians may contribute to the heterogeneity in the risk of diabetes among major ethnic groups at the present time….
Do you believe that the Han Chinese have had a surfeit of food compared to Africans over the past 10,000 years? Or compared to Europeans? Indians have had more food than Africans? The populations of the New World are in a food-poor environment? This doesn’t make any sense as an evolutionary explanation because the stable state for most of human history has been one of Malthusianism. A few people had a lot of food, ergo, the association of wealth with corpulence. Additionally, one can imagine that societies transitioning between modes of production would have a period when land would be in surplus and there was a lot of food. But for most of history life was grinding. This is simply an unbelievable story. Additionally, this SNP can’t explain most of the variation in diabetes. South Asians have the highest rates in the world, but they have appreciable proportions of the derived variant. I am of the CC (derived-derived) genotype myself (I justed checked on 23andMe), and I have a family risk of diabetes, so I know to ignore the relevance of these findings for myself when it comes to personal risk assessment.
There is probably not going to be one gene that explains diabetes, or obesity, etc. We already knew that, but there is a strange kabuki theater which goes on whereby research groups pretend as to the high significance of one locus, because how is it going to look to a granting agency that you’re out or explain ~1% of the variance in a trait for trivial predictive value? And yet usually they’re honest enough in the discussions to suggest that one finding needs to be integrated into a broader picture…as in the hundreds of other genes of interest!?!?!
This paper is fascinating as a work of human evolutionary history. They don’t have a good story, but they have results which need to be integrated into the bigger framework. But the paper is also a story of the culture of science today, driven by biomedical relevances which are often simply phantoms.
Citation: Chang CL, Cai JJ, Lo C, Amigo J, Park JI, & Hsu SY (2010). Adaptive selection of an incretin gene in Eurasian populations. Genome research PMID: 20978139