Wednesday, December 19, 2012

Is it 'progress' to identify 100s of genes for a trait? If not...what is it?

What is known
Crohn's disease (CD), an inflammatory bowel disease, has a large genetic component, but specific genes, as for most such complex diseases, have been elusive.  A paper in this month's American Journal of Human Genetics, "Refinement in Localization and Identification of Gene Regions Associated with Crohn Disease," Elding et al, reports that they believe they are zeroing in on the answer.

The gene most closely associated with Crohn's disease is one that plays a role in the immune response, NOD2 which codes for a protein that recognizes peptidoglycans, or bacterial molecules, and stimulates the immune system to respond.  It makes sense that genes involved in immunity would be involved, as the disease is inflammatory in nature, perhaps due to an impaired innate immune system which leads to chronic inflammatory response by the adaptive immune system to microbes in the gut.

A number of genomewide association studies (GWAS) of Crohn's disease have been done, but none has identified genes with large explanatory power.  A recent meta-analysis of six studies (reported here) identified 32 new loci associated with Crohn's, which, added to the 31 that had been identified in 2010, brought the total to 71.  This doesn't mean 71 single genes but instead stretches of chromosomes that generally contain multiple genes -- sometimes hundreds -- and 'gene' may mean other kinds of function than protein coding, such as regulatory, directly functional RNA, and so on.  The 2010 report explained 20% of the variation in the disease, and the additional 32 brought that total to 23.2%, which indicated that most of these loci represented genes with very small effect, and that many more loci were left to be found. And what about the 77% that is still unexplained?

The Elding et al. paper reports use of a "mapping approach that localizes causal variants based on genetic maps in linkage disequilibrium units (LDU maps)." That means chromosome locations, but not specific to any nucleotide or functional element.  The authors confirm 66 of the previously reported 71 loci, and narrow in on "more precise location estimates" in those intervals (that is, they come closer to identifying candidate genes rather than just chromosomal intervals). They identified 78 additional regions that were statistically significant, and which provide "strong evidence for 144 genes." They also found 56 "nominally significant signals, but with more stringent and precise colocalization." So, this paper reports 200 gene regions in total associated with Crohn's disease, most of which, the authors say, unambiguously implicate single genes. The reason for that inference isn't clear, since clusters of DNA units can function together.  Again, many of these loci contain genes involved in the immune system. The authors suggest that "The precise locations and the evidence that some genes reflect phenotypic subgroups will help identify functional variants and will lead to greater insight of CD etiology."

The immune system is complex and involves many components so that mapping that 'hits' in a region that has some immune system elements might happen by chance if you have 200 hits.  Also, the immune system is involved in response to external threats (like viruses and bacteria) as well as to internal problems (recognizing and repairing damaged tissues), so the reason for 'immune' involvement is unclear -- and a challenge to determining what is responding to what.

The authors have previously demonstrated genetic heterogeneity within the NOD2 gene region -- that is, that different genes explain risk in different people. They also found independent involvement of a nearby gene, CYLD. They further demonstrated the importance of precise definition of the phenotype to identify loci that might explain risk in multiple cases.  In this new paper they use a high-resolution linkage disequilibrium map, basically meaning that their test markers are closely spaced so that implicated regions are fairly short, which helps to identify genes in the implicated region of the chromosome, and fine-tune phenotype definition as well.  They were able to replicate 66 of the previous 71 gene locations, and identified an additional 134 signals, many of which contain genes.  One might always quibble with this or the other statistical issue, but the overall conclusion is unlikely to change.  In the authors words:
This is a major step forward in identifying the relevant genes and functional variants and thus elucidating the genetics of CD etiology. The very large numbers of genes [we identify] confirm that CD is truly polygenic and complex in nature. Many genes show functions that are compatible with involvement in immune and/or inflammatory processes as well as integrity of the intestinal epithelium and differentiation.  
In fact, do we know anything more than we did before this study?  It has long been clear that CD is polygenic and complex, with immune and/or inflammatory gene involvement.  Whether 71 genes are involved or 200, it means that this disease is another instance of many pathways leading to a complex trait.

Much money and effort by the highest quality investigators has been expended on this disease.  This means that by now we probably can't claim the complexity is an artifact of imprecise methods.  The complexity seems to be real.  Each CD individual is affected by a different combination of variants at their two copies of these 200 genes -- and, if this is 'simple' complexity, another 600 unidentified genes to top up the current 23% explained causation.

This also doesn't include any serious identification of environmental contributions.  That is, environments like, say, diet at some point in  life, may have different effects on different genotypes, so that a given genotype may have no harmful consequences in one lifestyle and be harmful in another.  And if there are complex interactions among the different contributing genomic regions, then enumerating the variants at 200 (much less 800) genome regions will not make prediction or perhaps not even treatment very genome-dependent.

It is likely that after all of this, some genetic variants will have predictive power or medical treatment relevance, and that of course will be genuine progress.  But it leaves the question that so many traits leave: how do we actually deal usefully with this sort of routine-level complexity?

No comments: