Analyzing ancestry with ADMIXTURE, step by step

Over the past few months I was hoping more people would start doing what Zack Ajmal, Dienekes, and David, have been doing. There are public data sets, and open source software, so that anyone with nerdy inclination can explore their own questions out of curiosity. That way you can see the power and the limitations of genomics on your own desktop. I wonder if one of the biggest reasons that more people haven’t started doing this is formatting. It can be a pain to convert matrix formatted files into pedigree format, for example. But the data gusher isn’t ending, look at what’s coming out (and has come out) in the 1000 Genomes project!

I’ve been thinking I need to write up a post which is a “soft landing” for people so that we can reduce the “activation energy” for this sort of thing…once you get hooked, you only go deeper. Luckily an anonymous tipster has sent me the link to a URL with a huge data set which has been merged, already pedigree formatted. Here are the populations:

!Kung
Buryats
Hausa
Mada
Punjabi Arain
Totonac

Adygei
Cambodian
Hazara
Makrani
Pygmy
Tu

African Americans
Chinese
Hema
Malayan
Romanians
Tujia

Algeria
Chinese Americans
Hezhen
Mandenka
Russian
Tunisia

Altaians
Chukchis
Hungarians
Maya
Sahara Occ
Turks

Alur
Chuvashs
Iban
Mbuti
Sakilli
Tuscans

Ap Brahmin
Cochin Jews
Igbo
Melanesian
Samaritians
Tuvinians

Ap Madiga
Colombian
Iranian Jews
Mexicans
Samoan
Urkarah

Ap Mala
Cypriots
Iranians
Miao
San
Utahn Whites

Armenians
Dai
Iraq Jews
Mongola
San Nb
Uygur

Armenians B
Daur
Irula
Mongolians
Sandawe
Uzbekistan Jews

Ashkenazy Jews
Dogon
Italian
Moroccans
Sardinian
Uzbeks

Azerbaijan Jews
Dolgans
Japanese
Morocco Jews
Saudis
Vietnamese

Balochi
Druze
Jordanians
Morocco N
Selkups
Greenlanders

Bambaran
Greenlanders
Kaba
Morocco …