Zack has started exploring the K’s of his merged data set for HAP. A commenter suggests that:
As you have begun interpreting the reference results, let me make a friendly warning: you have to keep in mind that most of the reference populations of ethnic groups are extremely limited in sample size (with only between 2 and 25 individuals) and from very obscure sources, and you should keep away from drawing conclusions about millions of people based on such limited number of individuals.
This seems a rather reasonable caution. But I don’t think such a vague piece of advice really adds any value. These sorts of caveats are contingent upon:
– The scope of the question being asked (i.e., how fine a grain is the variation you are attempting to measure going to be)
– The sample size
– The representativeness
– The thickness of the marker set (10 autosomal markers vs. 500,000 SNPs)
This isn’t a qualitative issue, easily to divide into “right” and “wrong.” Sometimes an N = 1 is very insightful. That’s why the whole genome of one Bushman was very useful. In fact, the whole genome of any random Sub-Saharan African, and the whole genome of any random non-African …