We are desperate to find a genetic cause, or at least a tractably small number of genetic causes of every trait we want to study. It's understandable that we want this kind of answer. Unfortunately, GWAS has not accounted for much of the heritability (estimated genetic fraction of causation) of most diseases and other similarly complex traits that have been studied, as we've often pointed out.
Rather than abandon the game, and especially since the dream of common variants having major effects on common disease (so they would be a usefully large market for Pharma to invest in targeting) is diminishing, we have had to become contortionists to try to find how or why genomics is still the way to approach such traits.
Our approach to this question is statistical and hence is based on repeated observation of enough observed instances of a variant for it to achieve statistical 'significance' in our data, whether or not that makes its effect of enough importance' for counter measures to be develop that are targeted against it. But this means that in any practical sense, we can't get large enough samples to detect the effects of very rare variants. We need other approaches.
One is to track variants in key gene regions among family members, looking for correlations between the presence of the variant and that of disease. How effective this will be depends on whether we know enough about the trait or about genes to find those variants that have such a track. If we find enough different variants in the same gene doing this in different families, that's strong evidence.
There are two ways, however, for rare variants to work. One, as just described, is for the variant to have a major effect all on its own. That could be detectable. But if combinations of many different rare variants are required, the variants coming from a number of genes and many different combinations having similar effects, this method may not work well. Unfortunately, there are theoretical reasons to think this will likely be the case.
Recently there has been a story in Science News about the number of rare variants that we each carry around. The story summarizes various recent papers, citing the authors. The following graph shows the results of various studies of genome sequencing of different individuals:
Obviously, in a huge population, almost any site will vary, and if lots of sites can potentially contribute to a disease, there will be lots of instances of 'causal' variants per gene, but these won't be detectable by group studies where one needs statistical association between the variant and the outcome (again, because statistical significance can't be achieved with very rare observations in this kind of study design).
The story was about disease hunting, which seems to be the international obsession (or, more accurately perhaps, rationale for funding the work). However, sites contributing to a trait's variation today will also potentially contribute to its evolution. Thus, in a subtle way too complex to go into here, hunting for the genes or variants that are responsible for the evolution of a trait is going to be very challenging, to say the least. It's hard enough to explain the selective (or chance) reasons for the trait's presence, much less what genes were responsible for its evolution.
The story also cites work by Andy Clark and Alon Keinan who have pointed out that the very rapid expansion of the human species in the 10,000 years since agriculture has generated a massive number of rare to very to very very rare variants. In a statistical sense, each gene lineage present at the beginning of that time has a million descendants today. This is not new or speculative theory but simply the consequence of the sequence-nature of genes: long strings of nucleotides mean many places where a nucleotide can change. Even if any given change is very rare, the genome and number of people born each generation are large. The variation is being found now that we can sequence at a high scale, as reports such as those mentioned here clearly show.
One sobering implication is that if we are concerned with the ways one can get a common disease or trait (behavior, morphology, or whatever, normal or not), then we face trying to work out this sea of nearly unique variation. It could be a hopeless task!
However, comparing close species with and without the trait in a sense aggregates the results of countless variants and genes and individuals over countless generations. In a subtle and statistically detectable way this could point to responsible genes , because the sample size that generated the result over time might leave enough evidence. Some methods to find such evidence are available (one test is called the Macdonald-Kreitman test, after its developers), though so far they are statistically rather weak for close species. Perhaps creative thinking will lead to new and better ideas.
Whether approaching disease in this way is the best thing to do is a separate question from how it can be done in practice if that is what, as currently, people are deciding they need to do.