Inferring and visualizing patterns in genomic data

Inferring and visualizing patterns in genomic data

I’ve been playing around with ADMIXTURE and EIGENSOFT with the the HapMap data set along with a few friends & family merged into it. It is interesting to see how the intuitive inferences you make from ADMIXTURE bar plots differ somewhat from PCA scatter plots. In any case, I’ve been posting some of the preliminary results on Facebook (in part because one of my friends is on Facebook and is curious about his own genetic background), and a friend who is a grad student pointed me to Structurama, which infers the best number of categories* (one can do cross-vaidation in ADMIXTURE). I’ve avoided STRUCTURE because it’s computationally more intensive. Any other recommendations? Specifically, something not mentioned by Dienekes or David.

Below the fold is a taste of the games my computer has been up to overnight. K = 5 ancestral populations in ADMIXTURE. HapMap Utah whites, Tuscans, Mexicans, Beijing Chinese, in that order. The last 6 bars are: my father, my mother, and then four individuals of European ancestry, Euro 1, Euro 2, Euro 3, and Euro 4. After merging files and pruning founders and thinning the markers to reduce linkage disequilibrium I was left …

Razib Khan