D.I.Y. population structure inference, part 1 of many

D.I.Y. population structure inference, part 1 of many

If you’ve been reading this weblog for a while you’ve seen many images like the one above. It comes from the 2008 paper Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. The data set is from the Human Genome Diversity Project. It consists 52 groups from around the world, curated for representativeness, but also ethnic distinctiveness. They utilized the FRAPPE program, which like STRUCTURE and ADMIXTURE estimates the ancestry of individuals (and in the aggregate populations) from a a combination of components, the number of which you specify with the parameter K. In other words, this is model based. It works out really well when you have an intuition of the model you’re looking for. Imagine African Americans, who you can presume are a two-way admixture between two distinct ancestral populations. It works less well in other cases. For example, South Asians are modeled by 23andMe as a two-way admixture between Europeans and East Asians. Why this occurs is totally comprehensible; they have three (Chinese + Japanese = one) reference populations which are very different from South Asians. So the computer, being dumb but fast, simply slaps …

Razib Khan