A recent paper in the American Journal of Human Genetics (XLID-Causing Mutations and Associated Genes Challenged in Light of Data From Large-Scale Human Exome Sequencing, Piton et al.) raises some important questions about identifying genes for variants associated with X-linked intellectual disabilities (XLID). But in doing so, the paper raises questions about identifying causal variants in general.
The incidence of intellectual disability (ID) is 1-2% in children. Patterns of occurrence have long suggested X chromosome-linked causation, and numerous X-linked variants have been found to be associated with ID. FMR1, associated with fragile X syndrome, was one of the first found, but the number has grown since then, "from only 11 in 1992 to 43 in 2002 and over 100 genes now". That is, 100 genes on the X chromosome thought to be associated with intellectual disability.
Some XLID is syndromic, and some not -- identifying causal gene variants is easier for syndromic forms because unrelated individuals with some or all of a specific phenotype can be more easily matched, and the assumption made that it's the same mutation that causes multiple symptoms. But, as Piton et al. point out, variation in the phenotype, with milder forms of a syndrome, or incomplete penetrance, not to mention that there can be multiple genetic pathways to a given trait, can mask -- or create -- similarities, thus making it harder to match unrelated probands (people affected with the trait you're interested in explaining).
|Gene: Wikimedia Commons|
But, let's say a potentially causal mutation has been identified in one or more people or families. Now it has to be validated, and its function deciphered. Mark Wanner at Jackson Labs recently interviewed leading human geneticist Aravinda Chakravarti on determining gene function. Finding a gene is the easy part, Aravinda said.
But while finding and localizing genes isn’t a big challenge any more, it is still very difficult to figure out what they actually do and how their malfunctions lead to disease. Clearly, one aim is to understand their normal function and another to figure out how changes in that normal genetic program leads to disease. For chronic diseases of humankind—cancer, heart disease, neurological diseases, anything—we still are largely ignorant of how genetic abnormalities lead to disease. That is still mostly a black box. That’s the part that many labs, including my own, are focusing on more closely and looking to solve. Unfortunately, this may require a disease-by-disease solution.When the variant has been found in only one or a handful of individuals, this makes it even more difficult. And, as Chakravarti said,
...typically a single gene has many alternative functional forms with many different functions. Understanding these functions across development and aging remains challenging since the universe of possibilities is so large. Moreover, few genes function by themselves and many functions are only evident by one gene interacting with another . . . figuring out this aspect is still in its infancy since the universe of these possibilities is even larger. One has then also to consider that the gene may have different functions in different cell types and tissues. That's why it’s hard.Piton et al. note that even if a functional study shows that a candidate variant has an effect at the protein or cellular level, this doesn't necessarily demonstrate that this effect is responsible for the trait. This is further complicated by the observation that people may carry a supposed causal gene variant but not have the trait. Does that mean the variant is not causal after all?
Given the difficulties in validating a gene variant's involvement in a trait, Piton et al. wanted to assess the likelihood that the genes currently reported to be associated with XLID in fact are. They used the National Heart, Lung and Blood Institute's database of 6,500 sequenced exomes, collected for genetic studies of traits unrelated to ID, as a set of controls in which to search for the 104 variants currently thought to be causal.
Most of the validating is done statistically, though this can be ambiguous. E.g., when a variant is present in a single affected individual, this can mean either that it's a false positive or that it's a rare causal variant, but it's impossible to determine which with a single observation. It can also be impossible to determine which with multiple observations, because some people can have the trait and not be affected, or multiple affected related individuals might have the variant simply because they have shared ancestry.
If a variant is also found in the NHLBI database of individuals of whom perhaps 50 males might have XLID, based on population prevalence coupled with the terms of inclusion in the database (being able to sign informed consent forms, e.g.), it becomes statistically more likely that it's a false positive, though there are numerous reasons why it might not be.
So, even addressing the question head on, aware of the issues, in many cases it's still hard if not impossible to definitively determine function. Piton et al. found 22 of the many mutations identified in the 104 genes associated with XLID in the NHLBI control group. But, if the variants are truly causal, they might have expected to find at most 1 or two given the expected prevalence of XLID in the sample. So, there were more than expected, and all but two were detected in males and females, which means they were indeed unlikely to be causal, given that XLID is overwhelmingly seen in males, as are most X-linked traits.
They further determined that 10 of the genes previously considered to be associated with XLID are most likely not, and another 15 are doubtful. But their results are only suggestive. They ranked other genes on the list from 'highly questionable' to 'likely' to 'needs verification', based on what is known about the gene, the variants and traits.
The more we know about genetics, the more it seems we don't know. As with many other traits, candidate genes and variants thought to be associated with XLID keep getting added to the list of possible causal genes, but they are rarely validated. Why are they considered causal? Most often for statistical reasons, but also including the sometimes deceptive fact that if a variant is found in a gene already on the list of genes associated with a trait, it's considered to be a very likely candidate.
Cystic fibrosis is another example -- thousands of variants in the CFTR gene have been found in patients with CF -- again as with many traits, most of the rare variants are assumed to be causal, not demonstrated, based on knowledge that other variants in the gene are causal. Similarly, Piton et al. have determined that some variants in genes known to be associated with XLID are apparently innocuous. To their surprise, they also found a mutation in unaffected males that had previously been demonstrated to affect the protein encoded by the gene and therefore was assumed to be associated with XLID.
Another cautionary tale appeared in Cell not long ago. A paper called "Exome Sequencing of Ion Channel Genes Reveals Complex Profiles Confounding Personal Risk Assessment in Epilepsy", Klassen et al., from Jeffrey Noebels' lab, reported the results of sequencing the exomes of 237 ion channel genes in people with and without sporadic idiopathic epilepsy. They found that "rare missense variation in known Mendelian disease genes is prevalent in both groups at similar complexity, revealing that even deleterious ion channel mutations confer uncertain risk to an individual depending on the other variants with which they are combined." That is, the same ion channel variants assumed to cause epilepsy are found in people without the trait.
But causation is genomic!
This all raises an important issue that is often overlooked. Perhaps the important issue, as the hunt continues for genes for disease. Causation is genomic, and background and environment matter. Marginal gene effects really may often depend more on the sampled backgrounds, therefore, than on the candidate gene itself.
This means that if we assume genes are causal, but that they don't always act that way, neither 'positive' nor 'negative' findings can undermine that theory. It allows us to keep our genomic causation axiom, and impedes conceptual progress. The underlying reason, consistent with the above sentiments, is that we do not have a theory for how genes affect traits. We would say that the nature of life is an evolutionary historical process that proliferates ad hoc diversity, with no prior constraint on which or how many or how genes produce which trait. In this sense, we can say that genomics is not like chemistry, in which every carbon atom is identical.
This is also why genomic risk rather than single-gene risk may be better, and why we've argued that single-site GWASsy results are suspect, but also even if one uses background, the backgrounds are so complex and varied they really don't help much except in relatively unusual circumstances.