Friday, June 5, 2015

Making a genomic Skype call

There is an interesting and, if we understand it adequately, important paper by Mifsud and others, in the June issue of Nature Genetics.  This relates to how genes are used by a cell, and how different parts of DNA functions are coordinated.

Regulatory sites are parts of the genome containing usually short sequences that are used to control when genes are used.  They can help them to be expressed, repress their expression, or prevent (insulate) one gene from being expressed when some other nearby gene has already been turned on.  Regulatory sites for a given gene are usually numerous, and can be upstream, internal to, or downstream of the gene.

As currently known, regulatory sites are usually found close to the gene they regulate, but not always.  There is currently only very fragmentary knowledge of regulatory sites and their location, but it's clear that they follow no general rule.  Additionally, many GWAS mapping studies have found 'hits', that is, DNA regions whose variation is associated with some trait, where the regions are not in or near any actual gene (that is, protein-coding region).  Further, many if not most GWAS hits for complex disease that have been confirmed affect regulation rather than the protein code itself.  This makes sense since most genes have multiple functions, and changing the coded protein's structure could affect many different traits and be quite damaging.  Altering its regulation will typically only affect one or some of its uses, and hence typically will be less detrimental.  That's the idea, at least.

Regulatory sites and their associated gene's transcription start sites (where RNA begins to be read off the DNA) are usually close together (or brought close together) because a complex of proteins is required to assemble at the start site and start the transcription process.  But how 'close'?

Various techniques have been developed to identify stretches of chromosomes that are physically close to each other in the nucleus of cells, usually from some cell culture or other source.  In short-hand, these are called Hi-C assays (there are various ways to do these).  The juxtaposed bits of chromosome are isolated from the cells, and then sequenced, and the sequences aligned to the human genome reference sequence to see where they are.  The analysis thus shows what parts of DNA are physically close in a given cellular context or cell type.  Remember that chromosomes insides cells are 3 dimensional structures, not just linear stretches of DNA.

The new paper uses a technique to identify parts of DNA that are where transcription starts (called 'promoter' sites) and regulatory sites ('enhancers', or other terms).  With this information, functional analysis can be done.  The new paper by Mifsud et al. looks at this issue.  Here is a figure from the paper that shows some of the points (I labeled some features for you):
Long range regulation can even skip over active genes.  From Misfud paper (modified to show features)
The authors use criteria based on chemical modification of DNA (by histones that package it, but are specifically informative for promoter or enhancer sites) to identify regulator and RNA transcription start sites (enhancers and promoters), and find that most regulatory sites are, as expected, near to the gene they thus appear to be regulating in these cells.  The figure also shows that genes actively being used may be in between the enhancer and promoter.

There are several particularly interesting points here. First, regulatory sites need not be near to a gene, but can be almost anywhere (or, at least, quite distant), so that we can't know a priori where the important sites are.  Second, as mentioned above, most GWAS 'hits' have been in regulatory sites.  Third, regulatory contacts between DNA bits can span actively used genes in between the sites; this raises the question of how those sites' enhancers and promoters are juxtaposed and/or stay open for business as spanning DNA parts are brought together in the nucleus.  Fourth, finding a mapping 'hit' in a non-coding region may tell us that some gene's activity is being affected and contributing to the measured trait (e.g., diabetes, stature, or whatever).

In a given cell thousands of genes (not to mention other regions that are transcribed into other sorts of RNA) are expressed differently in different contexts in the same cell (e.g., when it divides, when it is doing its normal business, when it responds to environmental changes).  And of course, each cell type will be using different combinations of genes.  This raises the question as to how the chromosomes all knot up in the orderly-appearing way that Hi-C methods identify, and then can re-knot as these or those genes go 'on' and 'off' (or 'higher' or 'lower' levels of transcription).   This would seem to be an intriguing 4-dimensional (space and time) geometric problem.  This analysis does not include trans connections, between enhancers on one chromosome and promoters on another (I thank senior author Cameron Osborne for clarifying this to me), yet a much larger kettle of fish as yet mainly unexplored.  So this is possibly, or probably, only the tip of the nuclear-interaction iceberg.

Genomic Skype calls
Regulation spanning very large distances effectively and rapidly is like making a complex multi-person international Skype call: instant communication from afar. This is remarkable, even if it confirms what we have suspected!  The finding raises the related question of how the conjoined parts of DNA 'find' each other.  Some data of this sort aggregates millions of cells at one go, but new methods have been applied to single cells, and they have found that there is stochastic variation among cells of the type in the same culture at the same time.  This paper seems to have been of aggregate data from many cells, so we don't know the role of variation among cells in the 'same' state, if that is really what can be said of the cell-source of these data.  So there are other issues yet to be understood (the authors don't claim otherwise!).

Genomic Skype calls may not just be across the country or across ocean, but maybe far out into space, figuratively speaking.  If the current limited technology is but the first opening of this sort of knowledge, then one can only wonder what far-reaching sorts of communication are going on within and maybe even among us.

It's of course one thing to document long-distance regulation in cell culture, and understand or even identify the related pairs from data on whole organisms--such as to find the relevant contributing genes to diabetes or some other trait.  Sometimes, experimental assays will be able to find the gene affected by a non-coding GWAS or other association-study 'hit'.  Other times, perhaps the vast majority, this won't really be possible or practicable.  And if hundreds of different genes are contributing, identifying them more accurately from mapping results won't necessarily simplify things. But it will help confirm those complex results, and will be interesting, potentially very important, new knowledge in its own right.

1 comment:

Anne Buchanan said...

A friend (Anonymous) has commented on this post by email, and Ken has replied. Copied and pasted below (with Anonymous's permission):

Anonymous: Your blog Friday lays out the challenge that I have been arguing is the most important question in genetics. I.e. what is the molasses that controls the balance between the order and disorder that defines life in general and the behavior of the genome in particular? Is it material that we know and have measurements in hand, but we don’t yet know how to model its role? Or is it something like the dark energy/dark matter that we have yet to measure? Is it the same molasses for the genome, metabolism and cognition? How do you suggest that we go about finding it and measuring it? How would we know if there is really no molasses?
Just how much more evidence do we need to convince the molecular reductionists among us that the behavior of the whole cannot be explained by the sum of the behaviors of the individual parts? Even if one accepts that it is the interaction between parts that is causal, what controls the interactions? And, what can we offer as an alternative research strategy that acknowledges the whole is more than the sum of the parts? But first, what tools are needed to test the hypothesis that there is dark matter/dark energy that coordinates the order and disorder that characterizes function of the genome?
It seems to that most research in genetics is expanding the number of unanswered questions and making understanding more distant. This is one reason why I have my doubts as to whether we have the capability as humans to fully understand the etiology of life.

Ken: Of course I agree with what you say. I think we're not asking the right question in some sense. For example, what is an 'interaction'? We all talk about it, and we may explain how we document it (e.g., non-additive effects in samples, molecules binding with each other....). But is an interaction a thing or a phenomenon, and if the latter, of what kind? To me, it is useful (conceptually at least) to think of particles and fields in physics. A 'field' is in a sense an 'interaction' when you have particles. It has properties in itself, not just that are identified by samples. So are there such things in biology that can usefully be conceived, and that are not just physics envy?

A: Our good friend Brian Goodwin imposed field thinking (the growing points in plants for eg.) in his explanations of development (see How the Leopard Changed its Spots and many of his other scholarly pieces long before the molecular revolution took priority over how we think about in science).

K: Yes, I know that work and his book Form and Transformation. But I'd go back to Bateson and then Turing, and it was these concepts that led to my working on developmental mechanisms in tooth formation (a nested modular trait). Reaction-diffusion types of models, and some recent experimental data, use waveform analysis. That's among cells. Population ecologists use models like Lotka-Volterra, which also are similar in spirit. But can this be extended much farther, to encompass other sorts of interaction (like, for example, the formation of Hi-C associations)?