Genetic differences within European populations

genmap3 One of the more popular posts on this weblog (going by StumbleUpon and search engine referrers) focuses on genetic variation in Europe as a function of geography. In some ways the results are common sense; populations closer to each other are more genetically related. Why not? Historically people have married their neighbors and so gene flow is often well modeled as isolation by distance. The scientific rationale for these studies is to smoke out population stratification in medical genetics research programs which attempt to find associations between genes and particular diseases. By population stratification I mean the fact that different populations will naturally have different gene frequencies, and if those populations exhibit different frequencies of the disease/trait under investigation then one may have to deal with spurious correlations. If, for example, your study population includes many people of African and European descent, presumably cautious researchers would immediately by aware of this problem and attempt to take it into account. But what about populations which are genetically closer, or whose genetic difference may not be so well manifest in physical characteristics which might clue you in to the issue of stratification?

That’s why the sorts of results which might seem common sense in the aggregate are useful. One can ask questions as to the genetic closeness of Irish and English, or Irish and Spanish, in a rigorous sense. In the United States research programs which are constrained to white cases and controls may hide population stratification because of the ethnic diversity of the American population. A primary motivation for studies of Jewish genetics are the cluster of “Jewish diseases” which are common within that population. In our age it is fashionable to focus on what binds us together as a species, but genetic differences matter a great deal. Ask the parents of multiracial children who require bone marrow transplants.

A new paper in Human Heredity examines a large sample of five European populations, and goes over the between population allele frequency differences with a fine tooth comb. Genetic Differences between Five European Populations:

We sought to examine the magnitude of the differences in SNP allele frequencies between five European populations (Scotland, Ireland, Sweden, Bulgaria and Portugal) and to identify the loci with the greatest differences…We found 40,593 SNPs which are genome-wide significantly…The largest differences clustered in gene ontology categories for immunity and pigmentation. Some of the top loci span genes that have already been reported as highly stratified: genes for hair color and pigmentation (HERC2, EXOC2, IRF4), the LCT gene, genes involved in NAD metabolism, and in immunity (HLA and the Toll-like receptor genes TLR10, TLR1, TLR6). However, several genes have not previously been reported as stratified within European populations, indicating that they might also have provided selective advantages: several zinc finger genes, two genes involved in glutathione synthesis or function, and most intriguingly, FOXP2, implicated in speech development. Conclusion: Our analysis demonstrates that many SNPs show genome-wide significant differences within European populations and the magnitude of the differences correlate with the geographical distance. At least some of these differences are due to the selective advantage of polymorphisms within these loci

They looked at ~350,000 SNPs across the five populations. The sample sizes were pretty large: 1,129 individuals from Bulgaria, 1,142 from Ireland, 656 from Scotland, 620 from Sweden, and 563 from Portugal. In the supplements they had a figure where they displayed the genetic variation on the two largest principal components for their sample and color-coded by region of origin. Next to this they transposed the PCA onto a map of Europe.

euro51

This confirms previous findings that the largest component of variation in Europe is north-south (at least evaluating to the west of a particular geographical cutoff), with a secondary east-west dimension. But the focus of the paper wasn’t really phylogenetic relationships between the populations as such, but the patterns of genetic differences across them. Table 1 shows the population to population differences in SNPs. Rescaled here means that the results were rescaled for sample size, which differed between populations, along with the value after a Bonferroni correction.

euro52

The pairwise differences are what you’d expect from the PCA. Most of the between population difference is probably due to history; populations random walk into their own gene frequencies through isolation by distance. But there’s more to the story than that, as is clear in table 2.

euro53

As noted by the authors genes in specific categories or classes are overrepresented among those with large between population differences. In particular, they focus on genes related to immune function and pigmentation. The reason for variation on the former is relatively straightforward, research on patterns of natural selection in the human genome have long pinpointed loci implicated in immune function as having been particularly shaped by this evolutionary genetic parameter, no doubt because disease resistance has a major impact on reproductive fitness. Additionally, it seems likely that immune related function is constantly being buffeted by selection because of the prominence of frequency dependent dynamics. As for pigmentation, it has also shown up as a major target of natural selection in many of the more recent papers, and it’s a trait whose genetic architecture we have a reasonably good grasp of now. They also found that the NAD synthetase 1 gene was stratified. They note that this impacts metabolism and has been found to have a relationship to the disease pellagra. Loci related to diet also seem to be disproportionately affected by natural selection, and that stands to reason as the shift to agriculture was relatively recent and many populations may still be going through transients (e.g., gluten sensitivity). The densities and diets of European populations even today vary a great deal. Italy is about an order of magnitude more dense in population than Sweden, and this has likely been the case for many millennia due to differences in primary agricultural productivity. Finally, the authors observe that FOXP2 is also stratified. This is the famous “language gene,” which regularly makes press every few years. The short of it is that FOXP2 seems to be involved in complex vocalization, and been subject to selection in tetrapod lineages where vocal ability is pronounced (birds, humans, etc.). They don’t make much of the variation in the paper, but it seemed warranted to note that the gene had popped up in their tests.

The authors freely admit that their findings are provisional:

Our paper focuses on the top 11 loci and suggests plausible mechanisms for most of them. However, the total number of genome-wide significant SNPs is 150,000 and the top hits clustered in several GO categories. We cannot judge which ones are due to the effects of selection or to other mechanisms. We present a full list of genes with the best and median p values for SNPs within them (separately for the full sample and for controls only), so that others can make use of this information in future studies…

Citation: Moskvina V, Smith M, Ivanov D, Blackwood D, Stclair D, Hultman C, Toncheva D, Gill M, Corvin A, O’Dushlaine C, Morris DW, Wray NR, Sullivan P, Pato C, Pato MT, Sklar P, Purcell S, Holmans P, O’Donovan MC, Owen MJ, & Kirov G (2010). Genetic Differences between Five European Populations. Human heredity, 70 (2), 141-149 PMID: 20616560