Regulatory sites are parts of the genome containing usually short sequences that are used to control when genes are used. They can help them to be expressed, repress their expression, or prevent (insulate) one gene from being expressed when some other nearby gene has already been turned on. Regulatory sites for a given gene are usually numerous, and can be upstream, internal to, or downstream of the gene.
As currently known, regulatory sites are usually found close to the gene they regulate, but not always. There is currently only very fragmentary knowledge of regulatory sites and their location, but it's clear that they follow no general rule. Additionally, many GWAS mapping studies have found 'hits', that is, DNA regions whose variation is associated with some trait, where the regions are not in or near any actual gene (that is, protein-coding region). Further, many if not most GWAS hits for complex disease that have been confirmed affect regulation rather than the protein code itself. This makes sense since most genes have multiple functions, and changing the coded protein's structure could affect many different traits and be quite damaging. Altering its regulation will typically only affect one or some of its uses, and hence typically will be less detrimental. That's the idea, at least.
Regulatory sites and their associated gene's transcription start sites (where RNA begins to be read off the DNA) are usually close together (or brought close together) because a complex of proteins is required to assemble at the start site and start the transcription process. But how 'close'?
Various techniques have been developed to identify stretches of chromosomes that are physically close to each other in the nucleus of cells, usually from some cell culture or other source. In short-hand, these are called Hi-C assays (there are various ways to do these). The juxtaposed bits of chromosome are isolated from the cells, and then sequenced, and the sequences aligned to the human genome reference sequence to see where they are. The analysis thus shows what parts of DNA are physically close in a given cellular context or cell type. Remember that chromosomes insides cells are 3 dimensional structures, not just linear stretches of DNA.
The new paper uses a technique to identify parts of DNA that are where transcription starts (called 'promoter' sites) and regulatory sites ('enhancers', or other terms). With this information, functional analysis can be done. The new paper by Mifsud et al. looks at this issue. Here is a figure from the paper that shows some of the points (I labeled some features for you):
|Long range regulation can even skip over active genes. From Misfud paper (modified to show features)|
There are several particularly interesting points here. First, regulatory sites need not be near to a gene, but can be almost anywhere (or, at least, quite distant), so that we can't know a priori where the important sites are. Second, as mentioned above, most GWAS 'hits' have been in regulatory sites. Third, regulatory contacts between DNA bits can span actively used genes in between the sites; this raises the question of how those sites' enhancers and promoters are juxtaposed and/or stay open for business as spanning DNA parts are brought together in the nucleus. Fourth, finding a mapping 'hit' in a non-coding region may tell us that some gene's activity is being affected and contributing to the measured trait (e.g., diabetes, stature, or whatever).
In a given cell thousands of genes (not to mention other regions that are transcribed into other sorts of RNA) are expressed differently in different contexts in the same cell (e.g., when it divides, when it is doing its normal business, when it responds to environmental changes). And of course, each cell type will be using different combinations of genes. This raises the question as to how the chromosomes all knot up in the orderly-appearing way that Hi-C methods identify, and then can re-knot as these or those genes go 'on' and 'off' (or 'higher' or 'lower' levels of transcription). This would seem to be an intriguing 4-dimensional (space and time) geometric problem. This analysis does not include trans connections, between enhancers on one chromosome and promoters on another (I thank senior author Cameron Osborne for clarifying this to me), yet a much larger kettle of fish as yet mainly unexplored. So this is possibly, or probably, only the tip of the nuclear-interaction iceberg.
Genomic Skype calls
Regulation spanning very large distances effectively and rapidly is like making a complex multi-person international Skype call: instant communication from afar. This is remarkable, even if it confirms what we have suspected! The finding raises the related question of how the conjoined parts of DNA 'find' each other. Some data of this sort aggregates millions of cells at one go, but new methods have been applied to single cells, and they have found that there is stochastic variation among cells of the type in the same culture at the same time. This paper seems to have been of aggregate data from many cells, so we don't know the role of variation among cells in the 'same' state, if that is really what can be said of the cell-source of these data. So there are other issues yet to be understood (the authors don't claim otherwise!).
Genomic Skype calls may not just be across the country or across ocean, but maybe far out into space, figuratively speaking. If the current limited technology is but the first opening of this sort of knowledge, then one can only wonder what far-reaching sorts of communication are going on within and maybe even among us.
It's of course one thing to document long-distance regulation in cell culture, and understand or even identify the related pairs from data on whole organisms--such as to find the relevant contributing genes to diabetes or some other trait. Sometimes, experimental assays will be able to find the gene affected by a non-coding GWAS or other association-study 'hit'. Other times, perhaps the vast majority, this won't really be possible or practicable. And if hundreds of different genes are contributing, identifying them more accurately from mapping results won't necessarily simplify things. But it will help confirm those complex results, and will be interesting, potentially very important, new knowledge in its own right.