The data sets in the dark

The data sets in the dark

Recently I was tipped off to the appearance of a new paper, Genome-Wide Association Study Identifies Chromosome 10q24.32 Variants Associated with Arsenic Metabolism and Toxicity Phenotypes in Bangladesh. This is the section which caught my eye: “Using data on urinary arsenic metabolite concentrations and approximately 300,000 genome-wide single nucleotide polymorphisms (SNPs) for 1,313 arsenic-exposed Bangladeshi individuals.” 300 K SNPs with 1,313 Bangladeshi individuals is a lot! I’m interested in this data set because of the 200+ participants in the Harappa Ancestry Project my parents remain the “unadmixed” South Asians with the highest fraction of East Asian ancestry (10-15 percent). Within South Asia aside from those groups with clear East Asian affinities only peoples of Munda background have the same levels. This data set could answer a lot of questions as to the typicality of my parents (literally within a few hours in terms of data exploration). But this is all you get in the supplements:

 


Zack Ajmal has already sent off an email asking about this data set, so hopefully the results will be positive.

This is a medical genetics study, so all they wanted to confirm is that there wasn’t population stratification due to inbreeding. They confirmed that. It is fine if they don’t want to explore further questions in relation to ancestry, but it would be really depressing if the data set can never see the light of day for those who are interested in asking other questions.

Razib Khan