The “how” of cystic fibrosis through the “why”

It’s just a fact that contemporary human evolutionary genetics has relied upon its potential insights into disease to generate funding, support and interest. I don’t think that this is much of a silver lining when set next to the suffering caused by disease, but it’s a silver lining nevertheless. Therefore findings which would be of interest in and of themselves are able to push to the front of the line because of possible medical relevance. A new paper in PLoS Genetics illustrates the relationship between what seem like esoteric evolutionary insights and diseases of importance to the medical community. It takes a look at the gene whose disruption results in the horrible illness cystic fibrosis, CFTR, and uncovers some interesting genetic patterns of possible evolutionary relevance. The paper is The CFTR Met 470 Allele Is Associated with Lower Birth Rates in Fertile Men from a Population Isolate. From the author summary:

Cystic fibrosis (CF) is the most common lethal recessive disorder in European-derived populations and is characterized by clinical heterogeneity that involves multiple organ systems. Over 1,600 disease-causing mutations have been identified in the cystic fibrosis transmembrane regulator (CFTR) gene, but our understanding of genotype–phenotype correlations is incomplete. Male infertility is a common feature in CF patients; but, curiously, CF–causing mutations are also found in infertile men who do not exhibit any other CF–related complications. In addition, three common polymorphisms in CFTR have been associated with infertility in otherwise healthy men. We studied these three polymorphisms in fertile men and show that one, called Met470Val, is associated with variation in male fertility and shows a signature of positive selection. We suggest that the Val470 allele has risen to high frequencies in European populations due a fertility advantage but that other genetic and, possibly, environmental factors have tempered the magnitude of these effects during human evolution.

The high frequency of alleles which result in cystic fibrosis is something of a mystery. Basic population genetic theory tells us that lethal (at least in the pre-modern era) recessive traits should be extant only at very low frequencies so that most of the deleterious alleles are “masked” by normal copies. The ΔF508 mutation is found in 1 in 30 people of Northern European descent (you see somewhat different ratios, but all in the same ballpark). That means that assuming a random mating Hardy-Weinberg Equilibrium a touch more than 0.1% of offspring would exhibit the disease due to the coming together of the ΔF508 allele in a homozygote state, not a trivial proportion when you consider that the fitness of these individuals converges upon zero.

In this paper they don’t get at ΔF508 and the other disease causing alleles directly. Rather, they find that one particular SNP has a strong effect on fertility, as well as having a relationship in some contexts to disease implicated alleles. Not too surprising considering that cystic fibrosis is associated with infertility. I presume that the overarching logic is that understanding the genetics of CFTR in its details will give us a better picture of its internal architecture and the various networks and pathways which result in its proper, or improper, function.

CFTR spans ~200,000 base pairs, but in the paper the authors focus on a few regions of interest within a sample from the American Hutterite community. In particular there is the 5-thymidine (5T) repeat allele at the 3′ splice site of intron 8, a variant which interferences with the proper transcription of exon 9. Then there is TG repeat (TG) on intron 8 and an SNP on exon 10, rs213950. In the latter case the two alleles result in the amino acids methionine and valine respectively at the 470th position (Met470 and Val470). Both of these variants have an effect on the 5T allele, increasing its penetrance in relation to the outcome of cystic fibrosis. The Met470Val mutation’s molecular genetic implications are double-edged outcome; Val470 results in a CFTR protein which matures more quickly, but with lower activity compared to the Met470 allele. Since 5T reduces splicing efficiency one could intuit why the presence of Val470, with its result of lower activity of the protein, might have a a deleterious effect when the two are found in conjunction.

The paper approaches cystic fibrosis sideways because the focus on Met470Val means that they’re looking at a secondary variant from a medical perspective; a modifier, not the primary agent. But from an evolutionary perspective there’s a lot to dig into! First, let me jump to the discussion, where they seem to admit the modest current medical relevance of this paper:

Lastly, there has been a long-standing debate as to whether disease-causing CF mutations, such as ΔF508, confer a fertility advantage to healthy carriers…Unfortunately, the results we report here do not provide insight into this question. The most common CF causing mutations in Europeans (i.e. ΔF508, G542X, N1303K, W1282X) and the most common mutation in the Hutterites, M1101K…all reside on haplotypes carrying the ancestral, Met470 allele in exon 10…the 9T allele at the polyT locus, and (by inference) the TG10 or TG11 alleles…Therefore, any positive fertility effects of the Val470 allele would not be expected to affect the frequencies of the common CF disease-causing mutations in European populations.

A haplotype just refers to a sequence/correlation of alleles along the genome. You know that DNA consists of a string of base pairs, AGCGCTGAGCGCAA…. If there is variation at the first and last positions in the sequence above, and if the alternative variants at the two loci do not associate randomly but exhibit high correlations along a physical sequence, then there may be a haplotype of the variants. In the case of this paper the three regions of mutations combine to form the haplotypes. Tables 1 & 2 show the frequencies of alleles and haplotypes within their Hutterite sample.

Table 1 lays out the frequencies of each allele within the sample, while table 2 illustrates the frequencies of combinations of these alleles. The haplotypes.

The next two figures show the major finding, the association between Val470 and higher fertility in Hutterite men (not women). Remember that p-value = 0.05 is the normal bar for statistical significance. The ticks in the second figure are 95% intervals.

Do I need to emphasize how important it is that the alleles have a correlation with reproductive outcomes? Changes in gene frequencies are driven by variations in reproductive outcomes, whether random or systematically correlated with phenotypes. Drift or selection. Traits strongly tied to reproduction often have low heritabilities because all the variation on such traits quickly disappear because of selection’s homogenizing power. It is interesting that in this case they’re implying that there’s heritable variation in reproductive outcomes, as they know a priori that selection should have expunged the variation, all things equal.

Here’s a more stark figure which illustrates the association between haplotype and fertility in a more stepwise fashion:

OK, so how does this vary across populations? The next figure comes straight out of the HGDP browser:

The variation on Met470Val exhibits an African/non-African difference. I assume that the variation in the non-African segment (compare the Tuscans to the Russians for example) is mostly noise because of the small sizes of some of the HGDP sample groups. The 0.10 frequency in the San sample is intriguing. I’ve never heard anyone assert that the HGDP San had likely non-Africa admixture, so existence of Val470 in this southern African group suggests to me that its appearance among non-Africans is not simply a random act of history (i.e., the outcome of the Out of Africa event and bottleneck). There may be common relaxations of ecological constraints on novel adaptation as one moves away from the tropics, or, new selective pressures.

I wanted to highlight the nature of the haplotype variation earlier because the authors ascertain the possibility of natural selection driving Val470 up in frequency among non-Africans using haplotype based tests of natural selection. In the figure below panel A shows the haplotype blocks. The short of it is that Val470 has a much longer haplotype than Met470, which stands to reason if Met470 was the ancestral state around which a lot of variation had crept in through drift (LCT, the gene which has a derived variant which confers lactase persistence has a very long haplotype on the selected allele because it rose in frequency faster than recombination and mutation could break apart the distinctive genetic profile of the original copy). Panel B shows extended haplotype homozygosity (EHH), while D shows iHS (integrated haplotype score). The latter is to some extent an elaboration of the former, able to detect selective sweeps which have not come close to fixation as those best detected by EHH. Panel C has Fst between African and non-African populations. Fst is a statistic which summarizes between-population variance. It is 0.43 for Met470Val, while genome-wide it’s 0.11. Both the Fst and iHS values for the SNP are on 5% tails of the distribution, illustrated by panel E.

The Fst differences, along with suggestions of homogeneity across the genetic scale for the allele, Val470, which confers reproductive fitness, strongly points to the possibility of natural selection. But the reproductive differences they found were large; why is Met470 still around? In the discussion there throw out some possibilities:

In fact, given the large fertility effects observed in the Hutterites, it is surprising that the Val470 allele has not gone to fixation in non-African populations. However, there might be several reasons why this has not occurred. First, the combined data on fertility effects of the Val470 allele indicate that this allele can be associated with both increased and decreased fertility, depending on genetic background. In the presence of the 5T allele at the intron 8 polyT locus, Val470 increases the risk of CBAVD and male infertility…In the absence of the 5T allele (as in the Hutterites), the Val470 allele is associated with increased male fertility relative to Met470. Although the mechanism of this interaction is obscure, it provides one example of counteracting variation that could increase the time to fixation of the Val470 allele. Second, as mentioned above, the Val allele could also be deleterious in certain environments, such as in the presence of specific pathogens or the 5T allele, as a result of its pleiotropic effects in other organ systems. Third, the fertility advantage we observed is restricted to males; we found no such association in Hutterite women…This would further slow the spread of the allele as there would be no selection advantage in half of all Val carriers. Lastly, this study was conducted in a population living under optimal conditions for reproductive success, including excellent nutrition and abundant food, access to modern health care, and negligible maternal mortality. Thus, estimates of fitness effects based on Hutterite fertility rates are likely inflated compared to the effects in human populations throughout most of evolutionary history, when competing selective pressures were likely more prevalent. Taken together, the lack of fixation of the Val470 alleles in populations outside of African may not be inconsistent with the fertility effects observed in the Hutterites, but rather suggestive of antagonistic effects of other genetic variations or environment factors that tempered these effects during most of human evolution.

Remember that we’ve seen for a while now that loci which exhibit signatures of positive natural selection are often not fixed to 100%. Why not? There have been many explanations offered, and the ones above fall into the general categories mooted. Looking at a relatively isolated population in a snapshot form may not give us a full impression of what’s going on. On the other hand, the Hutterite genetic uniformity presumably eliminates many of the confound signals which might otherwise obscure associations, so there are pluses and negatives to this sample. And of course evolution occurs over time, and peaking at slices tells us what it tells us, no more, no less. This is a place to start, but I bet it will make more sense once we have a better grasp of the distribution of dynamics across the genome. Scientific understanding often proceeds in a piecewise fashion, but the sum is greater than the parts as the sum often exhibits a structure of variation which allows us to squeeze more juice from the parts.

Citation: Kosova G, Pickrell JK, Kelley JL, McArdle PF, Shuldiner AR, Abney M, & Ober C (2010). The CFTR Met 470 allele is associated with lower birth rates in fertile men from a population isolate. PLoS genetics, 6 (6) PMID: 20532200