Visualizing intra-European phylogenetic distances

Neighbor-joining tree of genetic distances between populations

In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in F_ST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the F_ST statistic is, the more of the variation is partitioned between the groups.

Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the F_ST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting F_ST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, F_ST was the go-to statistic for making phylogenetic inferences on a within-species scale.

Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.

But F_ST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise F_ST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).

Using results from the 2015 paper Massive migration from the steppe was a source for Indo-European languages in Europe, I visualized pairwise genetic distances for European populations, ancient and modern (Han Chinese as an outgroup), on a tree. What the results illustrate is that

Ancient populations were very distinct in Europe from modern ones.
Many modern groups are clustered close together.

The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.

Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….