A few weeks ago I started looking at the 23andMe raw files of some of my friends and integrating them into HGDP and HapMap population data sets. One of the first things I did is remove the African populations from my total data. The reasons is as you can see to the left, Africans occupy the largest principal component of variation, which sets them apart from Eurasians. Without this dimension of variation the non-Africans are squeezed into one dimension, and groups like Oceanians and Amerindians show up in the strangest places. But that’s because these groups are non-African, and do not differ as much along the primary west-east axis of genetic variance which shakes out out of any such analysis. Africans aren’t the only issue though. As I’ve noted before I’ve been running ADMIXTURE, and isolated groups such as the Kalash can “monopolize” one particular color. This may be due to the Kalash being some distilled essence of an ancestral population, but I suspect that it’s more genetic drift due to isolation which has made these sorts of groups distinctive. So I removed these outliers…though do note that other “outliers” …
Visualizing “typical” Eurasians