Tea leaves and population substructure

Image credit: Wikimol

Over the past few months I’ve been encouraging people to pull down ADMIXTURE, and push the public data sets through it. Additionally, you can also convert your 23andMe raw file into pedigree format pretty easily and integrate it into the public data sets with PLINK. I’ve been following Zack’s Harappa Ancestry Project pretty closely, but I’ve been running the software myself and manipulating its parameters and seeing how things shake out. But the more and more I do it, the more I wonder if it isn’t like regression analysis, a technique which is just waiting to be leveraged by human biases. I began thinking of this more deeply after a conversation with a computational biologist who outlined the structural problems with how ad hoc the utilization of statistics is in the life sciences.

These sorts of qualms are probably why I’m posting my results more on Facebook and passing them around friends, rather than putting them out there in the public domain. It isn’t that I think the results are going to be abused. I just don’t know what they mean a lot of …