Looking for relatedness in the HapMap Gujaratis

Looking for relatedness in the HapMap Gujaratis

Recently I was looking at a 3-D PCA animation which Zack generated from the Harappa Ancestry Project data set. Click the link and come back. Notice the outlier clusters? The Burusho are straightforward, they seem to have low levels of Tibetan admixture. But what about the Gujarati cluster? Again, we see what we’ve seen before, the fractioning out of the Gujaratis in PCA into two groups, one a tight cluster, and the other relatively widely distributed. This prompted me to look more closely at the HapMap Gujarati sample. Today I was exploring the question with Plink’s identity-by-descent feature. First I’ll start out with a smaller data set, my family (father, mother, sibling 1, sibling 2, and myself), and an Indian (from Uttar Pradesh) and Pakistani as unrelated individuals. I merged out 23andMe derived genotypes, and with ~900,000 markers calculated pairwise IBD:

./plink –bfile IBDControl –genome

Here are the relevant results:

Individual 1
Individual 2
Z0
Z1
Z2
PI_HAT
DST
PPC
RATIO

Indian
Father
0.768
0.027
0.205
0.218
0.760
0.160
1.940

Indian
Mother
0.782
0.010
0.209
0.214
0.759
0.026
1.886

Indian
Razib
0.767
0.032
0.202
0.218
0.759
0.500
2.000

Indian
Sibling1
0.769
0.025
0.206
0.219
0.760
0.198
1.949

Indian
Sibling2
0.766
0.032
0.203
0.219
0.760
0.685
2.030

Indian
Pakistani
0.781
0.017
0.203
0.211
0.758
0.533
2.005

Father
Mother
0.776
0.018
0.207
0.215
0.759
0.284
1.965

Father
Razib
0.002
0.777
0.221
0.610
0.851
1.000
450.800

Father
Sibling1
0.001
0.785
0.214
0.606
0.850
1.000
898.800

Father
Sibling2
0.002
0.779
0.220
0.609
0.851
1.000
643.143

Father
Pakistani
0.778
0.019
0.203
0.213
0.758
0.201
1.950

Mother
Razib
0.002
0.788
0.211
0.605
0.849
1.000
639.429

Mother
Sibling1
0.002
0.781
0.218
0.608
0.850
1.000
639.857

Mother
Sibling2
0.002
0.782
0.216
0.607
0.850
1.000
447.900

Mother
Pakistani
0.779
0.020
0.201
0.211
0.758
0.052
1.904

Razib
Sibling1
0.183
0.408
0.409
0.613
0.866
1.000
11.386

Razib
Sibling2
0.194
0.432
0.374
0.590
0.858
1.000
11.491

Razib
Pakistani
0.781
0.016
0.203
0.211
0.758
0.933
2.095

Sibling1
Sibling2
0.236
0.412
0.351
0.557
0.849
1.000
9.413

Sibling1
Pakistani
0.777
0.024
0.199
0.211
0.758
0.327
1.973

Sibling2
Pakistani
0.774
0.024
0.202
0.214
0.758
0.443
1.991

You can infer some things without even knowing what the columns mean. Notice that there are differences between parent-child, sibling-sibling, and unrelated comparisons. The distance measure, DST, is basically exactly the same as the genome-wide comparison in 23andMe. Either the web app is running Plink, or, it’s using the …

Razib Khan