Anyone with a passing familiar with human population genetics will know of the Duffy system, and the fact that there is a huge difference between Sub-Saharan Africans and other populations on this locus. Specifically, the classical Duffy allele exhibits a nearly disjoint distribution from Africa to non-Africa. It was naturally one of the illustrations in The Genetics of Human Populations, a classic textbook from the 1960s.
Today we know a lot more about human variation. On most alleles we don’t see such sharp distinctions. Almost certainly the detection of these very differentiated alleles early on in human genetics was partly a function of selection bias. The methods, techniques, and samples, were underpowered and limited, so only the largest differences would be visible. Today we often use single base pair variations, single nucleotide polymorphisms, and the frequency differences are much more modest on average. Ergo, the reality that only a minority of genetic variation is partitioned across geographic races.
Why is Duffy different? Obviously it could be random. Assuming you have a polymorphism, you’ll get a range of frequencies across populations, and in some cases those frequencies which map onto different geographic zones just by chance. Imagine constant mutation, and high structured bottlenecks. You could get a sequence of derived mutations fixing in populations one after the other, just by chance.
This is probably not the case with Duffy. I’ll quote from Wikipedia:
The Duffy antigen is located on the surface of red blood cells, and is named after the patient in which it was discovered. The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The protein is also the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system.
Malaria is one of the strongest selection pressures known to humanity. The balancing selection which results in sickle-cell disease is well known even among the general public. But the likely selection pressures due to the vivax variety are well commonly talked about, partly because they don’t as a side-effect induce a serious disease. Duffy may be canonical if you are a human population geneticist, but it is of less interest more generally.
But a recent paper in PLOS GENETICS shows just how dynamic the evolutionary genetic past of our species was, through the lens of the Duffy system, Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans. Here’s the author summary:
Infectious diseases have undoubtedly played an important role in ancient and modern human history. Yet, there are relatively few regions of the genome involved in resistance to pathogens that show a strong selection signal in current genome-wide searches for this kind of signal. We revisit the evolutionary history of a gene associated with resistance to the most common malaria-causing parasite, Plasmodium vivax, and show that it is one of regions of the human genome that has been under strongest selective pressure in our evolutionary history (selection coefficient: 4.3%). Our results are consistent with a complex evolutionary history of the locus involving selection on a mutation that was at a very low frequency in the ancestral African population (standing variation) and subsequent differentiation between European, Asian and African populations.
Why is it that regions of the genome subject to selection due to co-evolution with pathogens are hard to detect in relation to selection? My response would be that it’s because selection and adaptation are always happening in these regions, constantly erasing its footprints in these regions of the genome.
You may be familiar with the fact that the major histocompatibility complex (MHC) are some of the most diverse regions of the genome. That’s because negative frequency dependent selection makes it so that rare variants never go extinct, as the rarer they get the more favored they are.
Many classical and modern techniques of selection require less protean dynamics when it comes to the model which they attempt to detect. Basically, many of the standard selection detection methods are looking for a simple perturbation in the pattern of variation that’s expected. A strong powerful recent sweep on a single mutation is like the spherical cow of evolutionary genetics. It happens. And it’s easy to model and detect. But it may not be nearly as important as our ability to detect these “hard sweeps” may suggest to us.
In contrast, if selection targets a larger number of independent mutations, then you get a “soft sweep,” which is harder to detect, because it is no singular event. Complexity is the enemy of detection. As a thought experiment, if you selected for height within a population you may catch some large effect alleles that would leave strong signals, but most of the dynamic would leave a polygenic footprint, distributed across innumerable genes.
The Duffy locus is somewhat in the middle. The authors distinguish between selection on standing variation (the allele frequency is higher than a single new mutation within the population) and a soft sweep, where multiple variants against different haplotypes are subject to selection. Their models and results strongly support selection on standing variation for the FY*O variant, and perhaps selection for the FY*A variant.
These selection events were very old, and very strong. Selection coefficients on the order of 4% are hard to believe in a natural environment. Curiously the coalescence times for the haplotypes some of these alleles indicate that selection was contemporaneous with the emergence of modern humans out of Africa, about ~50,000 years ago. From their sequence data analysis the different alleles have been segregating for a long time in the collective human population, and powerful sweeps fixed FY*O in both the ancestors of the Bantu and Pygmies before they diverged from each other. In contrast the Khoisan samples suggest that FY*O introgressed into their population from newcomers, while variants of FY*A are ancestral.
The big picture here is that selection is ancient, that it is powerful, and it was a dynamic even before our species diversified into various lineages.
If you read the paper, and you should, it’s pretty clear that a lot of the adaptive story was suspected. It’s just with modern genomics and fancy ABC methods you can put point estimates and intervals on these hunches. But another issue, as they note in the piece, is that we have a better grasp of African population structure today than in the past, and this allows for better framing.
But it is here I have some caution to throw. At one point citing a 2012 paper the authors suggest “The KhoeSan peoples are a highly diverse set of southern African populations that diverged from all other populations approximately 100 kya.” I can tell you that some credible researchers who have access to whole genome sequences and have been looking at this question peg the divergence date closer to 200,000 years. Some of the issue here is that you need to decompose later gene flow, which will reduce the distance between populations. Easier said than done.
The genetic prehistory of the African continent is almost certainly much more complex than what is presented in the paper, largely due to lack of ancient DNA within Africa. Northern Eurasia turned out to be far more complex than had earlier been guessed…and it is likely that Northern Eurasia has had a simpler history because of its much shorter time of habitation.
If I had to guess I suspect that the ancestors of the Khoisan as we understand them were a separate and distinct group who diverged between ~100,000 and ~200,000 years ago from other extant African populations. But I suspect our clarity is very low in relation the sort of structure which eventually resulted in the shake-out of only a few large groups of Sub-Saharan Africans aside from the Khoisan.