Recently I was looking at a 3-D PCA animation which Zack generated from the Harappa Ancestry Project data set. Click the link and come back. Notice the outlier clusters? The Burusho are straightforward, they seem to have low levels of Tibetan admixture. But what about the Gujarati cluster? Again, we see what we’ve seen before, the fractioning out of the Gujaratis in PCA into two groups, one a tight cluster, and the other relatively widely distributed. This prompted me to look more closely at the HapMap Gujarati sample. Today I was exploring the question with Plink’s identity-by-descent feature. First I’ll start out with a smaller data set, my family (father, mother, sibling 1, sibling 2, and myself), and an Indian (from Uttar Pradesh) and Pakistani as unrelated individuals. I merged out 23andMe derived genotypes, and with ~900,000 markers calculated pairwise IBD:
./plink –bfile IBDControl –genome
Here are the relevant results:
Individual 1
Individual 2
Z0
Z1
Z2
PI_HAT
DST
PPC
RATIO
Indian
Father
0.768
0.027
0.205
0.218
0.760
0.160
1.940
Indian
Mother
0.782
0.010
0.209
0.214
0.759
0.026
1.886
Indian
Razib
0.767
0.032
0.202
0.218
0.759
0.500
2.000
Indian
Sibling1
0.769
0.025
0.206
0.219
0.760
0.198
1.949
Indian
Sibling2
0.766
0.032
0.203
0.219
0.760
0.685
2.030
Indian
Pakistani
0.781
0.017
0.203
0.211
0.758
0.533
2.005
Father
Mother
0.776
0.018
0.207
0.215
0.759
0.284
1.965
Father
Razib
0.002
0.777
0.221
0.610
0.851
1.000
450.800
Father
Sibling1
0.001
0.785
0.214
0.606
0.850
1.000
898.800
Father
Sibling2
0.002
0.779
0.220
0.609
0.851
1.000
643.143
Father
Pakistani
0.778
0.019
0.203
0.213
0.758
0.201
1.950
Mother
Razib
0.002
0.788
0.211
0.605
0.849
1.000
639.429
Mother
Sibling1
0.002
0.781
0.218
0.608
0.850
1.000
639.857
Mother
Sibling2
0.002
0.782
0.216
0.607
0.850
1.000
447.900
Mother
Pakistani
0.779
0.020
0.201
0.211
0.758
0.052
1.904
Razib
Sibling1
0.183
0.408
0.409
0.613
0.866
1.000
11.386
Razib
Sibling2
0.194
0.432
0.374
0.590
0.858
1.000
11.491
Razib
Pakistani
0.781
0.016
0.203
0.211
0.758
0.933
2.095
Sibling1
Sibling2
0.236
0.412
0.351
0.557
0.849
1.000
9.413
Sibling1
Pakistani
0.777
0.024
0.199
0.211
0.758
0.327
1.973
Sibling2
Pakistani
0.774
0.024
0.202
0.214
0.758
0.443
1.991
You can infer some things without even knowing what the columns mean. Notice that there are differences between parent-child, sibling-sibling, and unrelated comparisons. The distance measure, DST, is basically exactly the same as the genome-wide comparison in 23andMe. Either the web app is running Plink, or, it’s using the …