Ewan Birney – On Genetics as a whole, and PRSs, and Robert Plomin’s book
Background
Twitter. I have a love-hate affair with it, but when it comes to science I #loveit. It is great to keep track of some influencers, like Mark McCarthy, Cecile Janssens, Eric Topol, Cecilia Lindgren, or Sek Kathiresan. I learn a lot from their work, their views on new science, etc. Likewise, I follow Ewan Birney, director at EMBL-EBI. He has a – excellent – habit to take a subject and produce a nice summarising thread. Recently he has done just that on the subject of genetics as a whole, with a view on PRSs and the controversy surrounding the new book from Robert Plomin. But it’s Twitter…not always readable. So here goes, the whole thread in blog-form, the un-abridged version (except for some small corrections).
Ewan Birney – On Genetics as a whole, and PRSs, and Robert Plomin’s book
In the rumbling discussion started by Robert Plomin’s book on genetics and intelligence and social traits, and touching on other, long running discussions points, such as validity of PRSs, and then genetics as a whole, I want to get some perspective. The first thing to say, is that I know in a robust “defence” of genetics as (for example) being strong evidence for causal relationships in observational studies, geneticists such as me can come over as a be … haughty, a little bit snobby and that genetics is unassailable. So, let me start with my own - a card carrying genomicist + geneticists - critique of genetics. What it tells us and what it can’t and where its strengths and weaknesses are. If I have time, I will loop back to genetics with respect to intelligence. The first thing to say, is genetics is always done “in some environment” - whether it is lab (drosophila, medaka fish), or fields/countryside (farm animals) or human environments (human genetics). Environments play two roles in genetics: In practical terms: if you only cycle light in a summer duration in Medaka fish tanks, you will never see winter behaviour (doh!); if you don’t feed cows grass, but rather maize, you will never see the grass specific digestive behaviour of cows. In human terms: if the environment around humans have low accessibility of calories, satiety control (leading to Type 2 Diabetes) won’t manifest; if children don’t have access to schools, their maths ability will be limited. Obvious, but worth stating.
This “constant” environment effect can be described as “moving the mean of the distribution”, for example, height has been increasing in human populations over the last 100 years mainly due to calories and good nutrition, e.g., vitamins, but it is more profound than moving means. It means everything we tackle in genetics is in some context, and very often we will affect things by “changing the context”. This includes (in human settings); mandating vitamins in certain foodstuffs; providing cheap glasses for people with myopia; providing good schools. The second way that environment plays a role is some combination of genetics with an environment “GxE” effects. These happen but are less common than you might think. In general, it looks like is pays off to be adaptive for most organisms not fixed (note; models below).
The big critique here of genetics is that certain important things in how an organism turns out, manifests, or behaves can be surprisingly silent. Now - an exciting part of modern genetics is that at the limitof insane numbers of individuals, nearly every gene is changed. This means that there should be rare events where we get an insight on how the constant context does play through. In laboratory animals we can trigger these rare events deliberately (so called - forward genetic screens). In observational human settings these are rare disease, and there is a long history of using rare mutations giving rise to unique features in human metabolism, development and many other processes. So - they are visible to “genetics” but rare. The second major trade-off of genetics is that it is “dumb” about mechanism. Literally anything is allowed, and some really convoluted routes from variation in the genome to the final effect on an organism. Here is a neat one that blends two of these features of genetics. If one does a GWAS for lung cancer, a non-synonymous variants in the nicotinic receptor is a strong hit. The mechanism looks pretty clear cut: Variant in Nicotinic receptor -> more addictive properties of nicotine -> when someone smokes it is hard to quit smoking -> if someone smokes, more likely to keep smoking -> lung cancer. Note - if there is no tobacco smoking products in the environment, then this variant is genuinely silent with respect to lung cancer (seen in some studies). It is a great bonus to genetics that one does not need to know about the mechanism to discover the variants. But … it is a big downside that genetics tells us next to nothing about mechanism (unless you are super lucky and a variant unambiguously hits a gene we understand).
The final critique is wrapped up in the purpose of a genetic study. First some background: In populations with natural polymorphisms the vast majority of variants are rare, and the variants which become more common “feel” selection over evolutionary time more. This means that variants which go to high frequency by and large can’t be under selection (there are exceptions - these are interesting - but they are exceptions). This combination - most variation is rare, whatever it does and phenotypic variation is less likely to be common - means that common variation usually has small effects - large effects only happen when rare (and really large effects - very rare - a. k. a. rare disease). Back to what a scientist wants to achieve with a genetic study; if you want to understand “more about phenomena X”, genetics’ upside is you don’t have to posit any mechanism or understanding of X beyond “it has some genetic components”. Here the scientist wants to gain some foothold into the molecular mechanism behind the phenomena. He or she doesn’t mind how weak the effects are - there are all sorts of subtle changes to proteins or regulatory regions - rather s/he is interested in gleaning some understanding. There is some cosmic pact here; the scientist gave up the ability to get clear cut mechanistic understanding, but gains the ability to find all mechanisms with polymorphisms. However, if the scientist wants to predict “phenomena X” then they have to (a) understand or randomise the context/environment (or make that assumption), and (b) crunch the numbers to create a model of the genetic component of this. To run these models (again, note on models below) one often needs to transform phenotypes into something plain and normal - sometimes quite a transformation - and a side effect of these models is that one gets out a number for variation by genetics (h2) and not (“environment”) These numbers are usually quoted with too much precision (often %); they are processing and context dependent, and almost can’t be either 0 or 1; mentally I think of 3 buckets: “not much impact from genetics”; “substantial but not dominant genetics”; “predominantly genetic”. Note, even these 3 buckets are in some context - remember the Medaka fish in summer or winter, or grass fed or maize fed cows, or humans with an accessible school or not. Now, the big thing when you use these models for prediction, one has to make assumptions about context (some models need more assumptions, such as PRS vs GRS about population, but the big one is context). When you get it right, it works very well (lots of farm breeding!).
Now some of the more nitty gritty secrets. The statistics behind all of this is relatively straightforward linear models (the complexity is about correlation between genetic polymorphism - linkage disequilibirum LD - and multiple testing), but linear models with rare categories. The rarity is due principally to the polymorphisms in the population, and this means (a) one needs large sample sizes, and (b) model sophistication is really hard. Basically, moving away from linear models is hard to do. Furthermore, when people have moved away from linear models there has been little benefit to complex traits (common disease). The most obvious non-linear aspect is the one that started it all off with Mendel - recessive and dominant - and even that is surprisingly … not useful. It’s an open question when we go up the next notch of samples (perhaps with sequencing, so we really know every single polymorphism), whether non-linear models (recessive-dominance) or even more elusive epistasis will matter. At the moment, it does not much in human common traits.
Now - back to intelligence+genetics. When one runs genetic analysis on virtually anything sensible associated with “intellect” - number of years in education (easiest one to get), or IQ tests, or “g” in the context at least of a western state (UK, Denmark, Sweden) one gets signal. Note the context; virtually everyone in these countries go to school. The genetic signal detected is in this context - no matter how good a child’s genetics is, if they don’t go to school they are very, very unlikely to go to university. In my “3 bucket” view of genetics, the genetics of intelligence falls into the middle one - “substantial but not dominant”. The precise %s will be dependent on how you normalise the numbers and throw out outliers. Don’t read much to the precise figure. Back to what to do next. Firstly, you can at the genome location of GWAS to glean something about what the genetics is pointing to. Reassuringly there some neuronal hits/enrichment, but it is a slippery, complex thing (just as most complex traits). This is most interesting to me. You can also switch perspective to being predictive. One has to make the assumption that this prediction has the “same context” - and one can test this by using predictors built in the UK and test in Denmark/Sweden (or visa versa). It does work. But – it’s not so clear what these predictions are useful for. It gives a number, but this number is not destiny - remember the context, so one has to have the context of sent to school, context of pushy parents, context of peers around you in the school class room. When you spell it out like this, the idea that these contexts are ‘average’ for everyone is laughable, but notice the cross-study (UK to Sweden) which suggests there is at least a commonality between these systems. But the deeper question is “what’s next” for this prediction. Here I flounder - we might be able to use these numbers in other contexts, for example prediction of a heart attack because we (a) know that heart attacks are bad, and (b) we have interventions (statins) to prevent this. In education we can’t have the same certainty that “staying to university education” is somehow an ideal for everyone, and furthermore we don’t have interventions that we know help. There is some sci-fi world of “precision education” coming after presumably extensive deployment of genetics in “precision medicine”; it’s fine to point out the possibilities, but this is definitely “over the horizon” to me. But that intelligence/educational attainment/g/whatever you want to measure around intellect has a “substantial component of genetics” and a deep root in the biology of ourselves is clearly true - and this biology varies for genetic reasons. This latter fact has consequences in education+policy; one can’t easily talk about equality of outcomes of education without taking it into account; in many societies we differentiate between (say) severe genetic delay vs. normal; we have to accept “normal” has inherent variation. Here this thread ends. I hope it’s useful (I appreciate this should have a been a blog post. Hmmm). I note I’ve not tackled ‘race’ / ethnicity and genetics mainly because that deserves it’s own thread and I need to get to bed!