In the US, a rare disease is defined as one that affects fewer than 200,000 people, and in Europe, as a disease with a prevalence of 1 in 2000. Cystic fibrosis is the most common Mendelian disease in European populations, and even it is rare, at 1 in 2500 Caucasian births. Sickle cell anemia is the most common single gene disorder among African Americans, with 1 in 500 newborns being affected.
Single gene effects that are serious and appear in childhood are generally rare because they so often aren't transmitted to the next generation and so aren't maintained at any appreciable frequency in the population. Even recessive variants would be kept to limited frequency in this way (as, for example in CF). With late onset, the variant could be more common since, while it may be devastatingly unpleasant its effect arises only after reproduction is over, so its frequency isn't kept in check by natural selection. If the disorder has complex environmental interactions and the like, it could have almost any frequency. Ironically, one might say that if it were very common, it can't be too too serious, and we might even consider it 'normal'.
While each single gene disorder may be rare, there are an estimated 7000 such disorders known; the causal gene is said to have been identified for about half of these, as reported by a recent paper in Nature Reviews Genetics ("Rare-disease genetics in the era of next-generation sequencing: discovery to translation," Boycott et al.). Most of these were found before modern very high throughput (e.g., 'next generation') genomic DNA sequencing methods were in use, and the work was laborious but Boycott et al. suggest that next-generation sequencing will circumvent methodological issues that prevented gene discovery in the past and that the genes for essentially all rare diseases will be identified by 2020. Of course, new single gene disorders and new alleles for known disorders are always possible as new mutations arise every generation, so the idea that all rare diseases will be explained by any given date can't be literally true.
How it's done
This post is a mix of what Boycott et al. say and our own cautions. Nothing we have to say is new, or unknown to people in the field, but we think it's worth saying again anyway. Whole genome and whole exome (protein coding segment) sequencing (WGS and WES) have both proven to be useful in the search for genes causing disease. WGS includes all 6.2 billion nucleotides, while WES is protein coding sequence only, about 2% or so of the genome, so it's a lot less expensive to sequence and, in theory, easier to interpret.
Boycott et al. report that more than 180 new genes have been discovered with WES. Methods for analyzing the sequence data are standard now, and analysis has been aided by the growing number of complete genomes of healthy people that are accessible to researchers for comparative purposes. The idea is that if the genomes of an ill and a healthy person share a variant, it's unlikely to explain the disease, though this isn't always the case.
|Figure 2 from Boycott et al.: Gene identification approaches for different categories of rare disease|
But there may be sharing of variants but not the disease, because the variant doesn't always cause disease (isn't complete 'dominant'); or variants may be shared but not related to the disease. This can complicate the story. Those variants that do seem to be good causal candidates (because, say, they interfere with protein structure) will be considered first as possibly causal, and hopefully confirmed in other people or testable in the lab in various ways, such as introducing the variant into a laboratory mouse.
Or, if multiple affected unrelated people share a variant, or have a variant in the same gene, it may be considered a good candidate. If the causal variant arose in a specific tissue by somatic mutation, comparison of sequence from affected and unaffected tissue can be informative. But that currently is usually impracticable for various technical reasons. If a child has an autosomal recessive disorder, meaning that s/he must have two causal alleles, one from each parent, comparing two affected siblings can pinpoint the alleles. Disorders caused by clearly X-linked variants (when, generally, only males are affected) are fairly straightforward to investigate, as all variants on all other chromosomes are ignored. These are classical approaches, nearly a century old, but with new DNA technologies they are easier to pursue. The real questions, of course, are how accurate are such models and how often will they work.
Boycott et al. note that finding genes for autosomal dominant disorders that are shared by multiple family members is challenging because they will share multiple variants across their genomes simply because they are closely related. If no candidate gene is suspected, it will be difficult to narrow the variants down to likely disease-related candidates.
These are now standard approaches to analyzing WES data, but the glaring problem with whole exome sequencing is that it is impossible to search for variants in regions of the genome that regulate gene expression, and there are numerous ways in which gene regulation can go awry and cause disease. In these cases, whole genome sequencing is necessary, but it is currently not possible to easily discern causal variants in the much larger sea of variants that whole genome sequencing will yield. Indeed, Boycott et al. state that this class of variants will be the one that hinders completion of the atlas of genetic causes of rare disease. Whole genome sequencing sounds like a savior, but so many variants will be found, and so little is currently known about what non-coding parts of DNA are doing, that this is currently an uphill battle, lots more promise than delivery.
And, identifying disease-causing genes isn't necessarily much progress toward understanding the disease, or finding treatments because, in Boycott et al.'s words,
In general, we are hampered by our incomplete understanding of the biological function of most genes and proteins; the linking of a poorly characterized gene to a human disease does not necessarily make the protein function clearer. Although in vitro analysis with cell lines from patients can considerably contribute to our understanding of protein function, often a more comprehensive investigation at the tissue, organ or whole-organism level is required. Thus, there is a need for coordinated model-organism research platforms to put disease-causing genes into a biological context.That is, there's a lot of work yet to do. And, as the authors also point out, progress on identifying genes causing rare diseases does not equate to progress in therapeutic approaches, and that's largely because there is little financial incentive to develop drugs for rare diseases that will bring few customers.
The Mendelian assumption: outdated thinking
The Boycott et al. paper is a review of the state-of-the-art, that is, it reflects widespread current thinking about genetic disease and how to identify genes that may be involved in causation. However, current thinking is too often classical thinking, still derived from Mendel and based on what is found in the easy cases, almost what one might call the genetic reductio ad absurdam. As we've posted and written often before, Mendel carefully chose single-gene, wholly deterministic examples in peas, to reduce the inheritance problem he was interested in (improving agricultural plants by hybridization). He believed in and wanted to find, single causal 'atoms' that applied to biology the way what chemists were finding applied to chemicals.
This led to the idea of genes 'for' disease, that is, genes named after disease because a mutation in them is a sufficient or even necessary cause of the disease. We still use Mendel's terms, dominant and recessive, for this kind of causation--identified traditionally by parent-to-offspring transmission. While this has gotten the field far in some ways, as we've said before, we're still prisoners of Mendel in some ways, and this has stymied progress, and surely will continue to do so.
A recent prominent geneticist reported a widespread search of the human gene-coding regions (exons) to find inherited disease-causing variants in offspring (and hence erroneously, and commonly, referred to as 'mutations,' erroneous because the parents also had the variants). The study reported what to them was the surprising finding that many diseases were due to variation in two, not just one gene. Indeed, this was referred to as a 'paradigm shift'.
The use of such a phrase for what is in fact not at all a surprising finding reflects the trivialization of the important concept of true revolutions in thinking for which the term 'paradigm shift' was coined by Thomas Kuhn (we've posted on this before, too). Why trivial? Why not new? Isn't disease Mendelian?
The answer is that genes are transmitted in Mendelian fashion but traits, including disease, only manifest when genes go into action. Traits are not transmitted. And the focus on single-variant traits, like Mendel's peas, characterized by AA, aa and aA gene transmission, showed decades ago,that the story was much more complex than big-A sick, little-a well kinds of concepts. Instead, even just in regard to a single disease-affecting gene, we know there are tens, hundreds, or even thousands of different variants seen in different combinations in different patients with different genotype-related quantitative effects on the nature or severity of the trait. And rarely are all cases of a disease due to a single gene.
To document multiple-gene causation is an important task of human genetics. But to cling to old terms and concepts out of ignorance or habit, and to manifest surprise when the concepts are shown to be wrong is actually quite revealing of the mental prison to which, even despite decades of corrective knowledge, the history of science encages its present.
There are many cases of diseases whose individual cause is primarily the result of the genotype at a single gene. These are, diagnostically if not therapeutically at least, very valuable to know about. The phenomenon of Mendelian disease--properly understood--is certainly real. But it is the easy exception that lures us into thinking everything is roughly the same and hence being surprised when finding it's not.
If we view every modification of our ideas as a 'paradigm shift' rather than just a modification, or expansion of current working models or theory, then the term loses any real meaning. That's happened, because it's a kind of self-flattering act to liken the finding of one's research to paradigm shifts. Kuhn was talking about Copernicus vs Ptolemy, Einstein vs Newton, Darwin vs Genesis, and unifying discoveries like continental drift, as the paradigmatic sorts of eye-opening changes.
It is very difficult to shake old concepts, and we like to think that we're not just making 'findings' but 'discoveries', to use subjective adjectives. Still, that aspect of science sociology should not, in this instance or any other, blind us to the real progress that is made, even if the idea of paradigms and revolutions are turned into slogans.
Good needles to find in the haystack
Finding genes for 7000 rare diseases is not the same as explaining disease in the millions of people who have one of them. The more than 2000 alleles in different parts of the CFTR gene in different people, that are assumed to cause cystic fibrosis because they're found in patients, comprise just one example of the complexity of 'simple' genetic diseases. They show that terms like 'dominant' and 'recessive' which we got from Mendel, are conceptual dinosaurs and really should be dropped (CF is said to be a 'recessive' disease but most patients do not have two copies of the same defective variant). And, many rare diseases, even if caused by a single gene in every affected person, are caused by variants in more than one gene, sometimes many genes. So we should drop the idea of a gene 'for' a disease, too. But conceptual changes are hard to root out, and it's probably fair to say that the easy diseases and disorders have largely been done.
But still, we fully recognize that identifying genes for rare diseases is inordinately important for people who suffer from these diseases, for multiple reasons. Indeed, the search will go on despite the many obstacles, and there will be successes. And deep gratitude when there are.