Epigenetic Scores for the Proteome

Listen: YouTube | Apple Podcasts | Spotify

The proteome is the collection of proteins that are present within a specific cell, tissue, or system within the body. Our circulating proteome refers to the proteins circulating in our bloodstream and is made up of proteins that are either produced in the circulatory system, or proteins that enter the bloodstream from other organs and tissues in the body.

Why do we care about this?

As we know, proteins are extremely important! They are influencer molecules that maintain our health, and they’re also often the mediators of disease. Furthermore, protein biomarkers have been identified across many age-related morbidities. As proteins are the primary effectors of disease, connecting the epigenome, proteome, and time to disease onset may help to create new, predictive biological signatures.

I’ve been lucky enough to know Dr. Terrie Moffitt through my company, TruDiagnostic, as we have the exclusive license to the DunedinPACE in all verticals. Dr. Moffitt’s uplifting attitude and outlook of being “cautiously optimistic” when working with the Dunedin cohort and other researchers using the DunedinPACE makes for a fun and interesting conversation.

DNA methylation (DNAm) has been linked to the levels of proteins in our blood and the risk people have of developing chronic diseases. DNAm reflects the body’s exposure to chronic stress and inflammation and while this process is dynamic, DNAm may be more stable than protein measures, which can be variable across multiple time points. DNAm scores for proteins could therefore be used to identify individuals with high-risk biological signatures, many years prior to disease diagnosis.

In this week’s Everything Epigenetics podcast, Danni and I chat about the circulating proteome, how machine learning can be used to create epigenetic scores, and how information from the blood can be used to stratify risk of disease. We focus on the results from a study Danni published last year that integrated epigenetic and protein measures from the blood to develop new biomarkers for disease prediction. Danni’s work integrates these blood-based markers with the medical records of thousands of individuals to model disease onset.

Danni is in the final year of her PhD, on the Wellcome Trust Translational Neuroscience programme at the University of Edinburgh.

 

In this podcast you’ll hear:

– Danni’s neuroscience background and what got her interested in the field
– Why the Wellcome Trust Translational Neuroscience programme at the University of Edinburgh was a perfect fit for her
– A walkthrough of the central dogma
– A review of DNA methylation
– What the circulating proteome is and why it’s important
– The importance of proteins as biomarkers
– The definition and importance of an EpiScore, a term that Danni coined
– The strongest methylation signature we’ve seen to-date
– Why using DNA methylation to predict protein levels may be useful
– Considerations on using blood when investigating these markers
– The definition and importance of protein quantitative trait loci (pQTL)
– The cohorts Danni investigated in her paper “Epigenetic scores for the circulating proteome as tools for disease prediction”
– How Danni applied machine learning methods in the context of cohort studies
– How Danni created EpiScores for protein levels (methylation levels as the input, and protein levels as the output)
– The value of using protein EpiScores for disease prediction and risk stratification
– Inflammation as an important indicator of health
– How EpiScores compare with polygenic risk scores
– The importance of these risk scores in the context of age-related chronic disease
– The challenges and future directions in Danni’s work (focusing on machine learning methods and omics datasets)
– How people can be involved in large-scale cohort studies
– What’s next for Danni

 

 

Transcript:

hannah_went:
Welcome to the Everything Epigenetics podcast, Annie. I’m super excited to have you today.

danni:
Yeah, it’s great to be here. Thank you for having me to talk about science.

hannah_went:
Absolutely. I want to start off just hearing a little bit about yourself. Can you tell us about your journey? How did you become interested in DNA regulation? You’re pretty unique in the fact that you’re currently getting your PhD and what led to that decision? I’d love to hear a little more.

danni:
Yeah, so I come from a neuroscience background. I studied neuroscience at university. And this is perhaps not the typical kind of response to this question. But I think that decision was in part shaped by. So when I was growing up, my nan had a diagnosis of a terminal brain tumor. And it was in her cerebellum. So that’s like the movement control center of the brain. That whole experience firsthand of witnessing her losing the things that we take for granted that we can do when we’re healthy. That started the ball rolling in terms of questioning. What happens when the brain goes wrong and why do we get sick? And it really led me towards neuroscience, I think.

So at university, I studied the kind of construction of our nervous system. And I worked actually in a lab starting out. I modeled a phenomena called synaptic plasticity. So this is the way that neurons communicate with each other in the brain through kind of electrical impulses. I then became interested in genetics and whether that was on the level of a single cell.

All of our cells have the genetic instructions that dictate their function and what they do, but whether that was in terms of like our… genetic risk profile of a given disease. I just started becoming more and more interested because it seemed like this was where the root cause of a lot of things lied. So in my masters, I studied, I learned coding and statistics because that’s really in the area of computational genetics. It was during that time that I was accepted onto the Wellcome Trust Translational Neuroscience Programme at the University of Edinburgh.

I think probably the best possible program for me, because during the first year, you go around and you shadow clinicians and you meet patients, individuals who are living with the diseases that you’re studying. And I think that’s super important to get us out of this kind of bubble that we can sometimes exist in research and actually understand what’s going on at the point of care for people. So during that time was when I started working with epigenetic data. and I met Professor Riccardo Marioni, who is now my fantastic PhD supervisor.

As soon as I learned about this intuitive idea that our epigenetics is this interface between our lifestyles and our environment and our genetic regulation and disease risk, that that was really what I decided in terms of the brain. I wanted to study kind of that interface and how that looked.

hannah_went:
Yeah, amazing backstory. You know, you have

danni:
Hahaha!

hannah_went:
that personal touch to it and that drive and that motivation. I think that’s great that you are also able to get some hands-on experience as well to realize this is actually what I want to do. We need more of that to make sure we’re in the right field and it’s really applicable in terms of what we thought we may be doing. So that’s great to hear.

danni:
Yeah.

hannah_went:
Now. The main conversation we’re going to have today is what you’re currently studying now. So I’d love for you to explain, jump into that a little bit further, your PhD, what you’re researching. I know your work looks at the DNA methylation signatures from blood to predict certain outcomes such as protein levels, lifestyle traits, and diseases. So to everyone listening, I’d like to back up. Some of you may be familiar. Could you also give us a reminder of what DNA methylation is and the terms and how you’re looking at it?

danni:
Yeah, of course. To start with, I think we need to go to the genetics first off. So if you were to go inside a cell in your body, you’d go to the nucleus and you’d find a copy of your entire genome. So the genome is made up of 23 pairs of chromosomes. And they’re kind of like, I think of them a bit like how we’d imagine a supercomputer, right? So they store all of the information and the genetic code. that is the complete instructions to make us, essentially. If we unwound all of those chromosomes and put them back to back, that forms our kind of DNA. You can think of it as a ladder, I think.

If we spread out this ladder of DNA, there’d be something like three billion different rungs on the ladder. And at each of those rungs, we’ve got different letters known as bases. So you can have either C, G, A, or T bases. And it’s different combinations of these bases that form our genes. And when a gene gets read out, it creates a protein. So what happens in terms of the epigenetics here is that tiny chemical groups can come along and they can attach to points in the genetic kind of DNA ladder. We call them epigenetic modifications. Thi s because they’re not changing the underlying genetic code, but they are changing how it’s read out.
An analogy for this is you can think of it like a dimmer switch on a gene. For example, where if you’ve got methylation bound, it’s not going to change the protein that the gene produces, but it can act like a dimmer to turn down the expression of that gene. So the epigenetic modification that I study is called DNA methylation. That happens when a methyl group, so that’s one carbon and then three hydrogens, it comes along and it binds to areas of the DNA rich in C and G letters.

That’s what we can do with DNA methylation is look across the cells in a blood sample from an individual.  We can see whether a particular site in the genome is generally unmethylated or methylated. And we can do that simultaneously across up to, you know, like 800,000 different methylation sites at once. So this is the kind of data that we work with.

hannah_went:
Yeah, absolutely. And epigenetics is important for a plethora of reasons, right? I like how you explain that, like that dimmer, that knob, right? Kind of controlling the levels of different things being produced. So that was probably the best explanation I’ve ever heard leading up through the central dogma. I appreciate you giving a general overview of that. You mentioned the epigenetic modification you’re looking at is going to be that DNA methylation as you just defined. But you’re going to look at those signatures of the circulating proteome. So you’re looking at all of these different proteins in the incidence of disease. So what is the proteome? Can you talk  more about that and why you’re looking at that area of interest in the body?

danni:
The proteome represents the collection of proteins that could be within a cell, a tissue or a system within the body. And our circulating proteome refers to all of those proteins that are circulating around our body and our bloodstream. This is made up of proteins that can be produced in the bloodstream, but it can also be an instance where a protein is made in an organ or a tissue of the body. and it can enter into the bloodstream. So that’s really when we look at the blood proteome, that’s what we’re looking at, that whole collection of proteins.

hannah_went:
Gotcha, yeah. And why would we use that as a biomarker, right? A biomarker just being something we can look at that tells us maybe the outcome of something else. What’s the importance there?

danni:
Proteins are really interesting as a feature to look at in terms of our health. So they are the kind of effector molecules that maintain our health first and foremost within the body, but they’re also often the mediators of disease. So we know that there are a whole bunch of different protein markers and proteins within different tissues that drive different diseases. This means that when people develop therapeutics, Proteins are often actually the targets that we’re looking to try and understand and modify in some way to treat your disease. So if we take a disease like dementia, for example, we know that there’s a component of dementia that’s down to our genetics that dictates risk.

And we also know that there is a whole bunch of lifestyle and environmental factors that can impact and influence our risk on top of our genetics. And we know that there are these blood proteins that, in some instances, can help us to predict the onset of dementia. So by joining up all of these different data types, especially in my PhD, the protein and epigenetic signatures from the blood, we’re hoping that it will give us a much more complete profile of a disease and be able to stratify risk in individuals more than using, say, genetics alone, for example.

So in the blood proteome the current technologies that we have, measure up to around 5,000 different proteins. In the study that we’re going to be talking about today, we looked at around 900 different proteins. i because in the machine learning approach that we took, you need to have a couple of different cohorts that have those protein data available. And that’s really important for training and testing the models that we work on.

hannah_went:
Sure, sure. And yeah, I can’t agree more. Proteins are a great biomarker. I think it’s becoming more prevalent with the healthcare providers that I talked to today, right? They’re gathering all these biomarkers from their patients and they’re starting to include more of those proteomic values too, which is always great to see. The more layers you have from that multi-omic approach, the better you can actually assess the health of someone in those specific outcomes or risk modules. So. In that paper you mentioned that we’ll discuss, you’re looking at all of these proteins and it seems like you’re training and testing these epigenetic scores for these proteins and you talk about this value that’s called an epi score. It’s just such a good name. It sounds really cool and hip. So can you explain what an epi score is?

danni:
Yes, I can. And so our group have been the group that I work in, and we’ve been training at DNA methylation scores for loads of different outcomes. And we suddenly had the idea, you know, when I was writing up my results, we just coined this term epi score, which I think has a bit more of a ring to it from just that kind of perspective. So an epi score literally just stands for epigenetic score. And the epigenetic scores that we train, they’re all reliant on hundreds of thousands of DNA methylation sites that get fed into the models. And then methylation sites, they’re predictive of a given outcome, are selected in order to create the scores. Now, before I came along, people had been training scores for things like lifestyle traits, like smoking.

And there’s actually, like in terms of DNA methylation, smoking is perhaps… the strongest signature that we currently know of in terms of how it changes DNA methylation across the epigenome. So the scores that have been trained for smoking, there’s actually some evidence to suggest that they can be more accurate than people self-report. So you know, when you come to a study and you’re filling out, you know, on the questionnaire, whether you’re a smoker or not, maybe that’s to do with people fibbing. I don’t know. But we definitely see. that the smoking scores can differentiate really well whether you’re a smoker or not. So what I then came along and did in my PhD, is said, okay, we’re gonna create epigenetic scores, but we’re gonna have proteins as the outcome.

And there was a couple of reasons for doing this. So one is this whole idea of combining the different types of omics markers from the blood and this idea that adding proteins into the model, creating these epigenetic scores for proteins, is going to give us a layer of extra information from the protein biology over and above what we could generate with methylation alone. But there’s also an interesting thought around this. So we are thinking at the moment is that these protein epi scores might be more stable than some proteins. Because if you think about If you came for your blood sample at the study baseline and you had a cold brewing that week and your inflammatory proteins were all over the place, because we know that they can vary when you’re fighting off an infection, then that creates variability within the protein measurements.

But because methylation, although it is dynamic, it’s thought to represent this kind of longer term signature of exposures and your environment across your lifespan. We think that by training these methylation scores for protein levels, they might in some cases be a bit more stable and reflective of the general state of inflammation, for example, in the body. So this is what we did in the study. And I don’t think anyone before had scanned that many proteins to test for epi scores. And no one had certainly, I think, modeled protein epi scores with the onset of disease.

hannah_went:
Yeah, I find that just fascinating.

danni:
Okay.

hannah_went:
A couple points there. Actually, right before I got on this recording with you, I was talking to someone about the smoking signature that you see through methylation on that AHRR gene, how you want more methylation there. Because if not, you can tell people were a smoker or a past smoker. So I know that signature is just extremely, extremely strong.

There’s a lot of literature out there about that. Again, I love the multi-aomic approach because just looking at one marker that could be varied from like you said, a sickness, a cold. So kind of adding in those other factors and almost leveling the playing field to get a true reading rather than hearing other noise from outside variables I think is a huge need in the space just so we can identify what’s actually happening in the body rather than maybe an influx number of some sort. So … Really great to hear that you’re doing this work.

danni:
Yeah, so on the AHR smoking signature, my colleague Daniel actually did a really cool study a couple of years ago where they looked at the DNA methylation profiles at that specific site you mentioned, and they looked at the classification of people who had just recently stopped smoking. So to try and see, you know, how far, like does the DNA methylation signature change?

And this was over the course of I think about five years. and they saw that it did, which is reassuring for people who are quitting smoking that the number of people from their methylation that were classified as smokers, I think, dropped down from about 80% right down to about 20% or something in the study. So you do see the cool thing about this is there is dynamic changes that happen, and in that case, over the course of like five years in terms of the smoking signature.

hannah_went:
Oh, wow. Very cool. Yeah. I’ll have

danni:
Yeah.

hannah_went:
to have Daniel often times or, you know, because that gives, you know, people hope, that gives them a reason to, you know, hopefully stop smoking and helping them maybe change their lifestyle habits. So very cool. A little really great, great insight there. Now going back to your study, you’re actually testing epigenetic signatures in the blood. This is so important. This is something I stress to almost every single person I talk to about. these DNA methylation markers. So can you just remind our listeners of the importance of using blood as the sample type over other tissues when investigating these markers?

danni:
Yeah, so blood first and foremost is accessible. So it means that we can study omic signatures in the earliest kind of stages of disease because you won’t miss a blood sample if I take it from you now in the same way that you’ll miss a brain sample or a liver sample. So there’s a big difference there in terms of when we can actually take blood, we can take it at multiple points across the life course. and track your health status. But blood is also circulating around the body. So it’s a good place to start if we want to try and uncover the general health state of a person.

But the complicating factor in all of this with methylation is that unlike our genetics, if you went into the different cells across the body, the genetic code is largely the same. The epigenetic signature is not. So there’s a lot of stress. tissue specificity and blood specificity in terms of DNA methylation. And I actually published a study a couple of years ago where we projected epi scores for things like smoking, alcohol consumption and BMI that were trained in the blood into DNA methylation from the blood in the brain.

And what we saw is a huge amount of variability in these scores between both the blood and the brain, although smoking came out on top as the thing was most correlated as expected. But there was also variability within the different brain regions in terms of the methylation. So if you’re looking to kind of get the precise mechanisms, in my case, of a specific disease in a specific cell of the brain, the DNA methylation from the blood is not going to be the best place for you to look. But it doesn’t mean that the information contained in the blood is not of value. because for disease prediction, which is the name of the game that I work on, there are still really important factors circulating in the bloodstream.

So inflammation, metabolite changes, and all of these things have been associated with poorer brain health and the risk of disease still. There’s also instances, I mentioned proteins leaking into the bloodstream earlier, and in a lot of different neurodegenerative diseases, for example, there’s a breakdown of the blood brain barrier. So this is a very tightly controlled membrane that separates your blood from the brain. And what happens in the early stages of neurodegenerative disease is it becomes very leaky and proteins can come into the bloodstream and basically act as warning signatures that something is not quite right.

You know, that’s typically… what blood can do, there are limitations and blood is also most commonly kind of measured across cohort studies that we work with. It is like a certain part of this is as a research researcher, you kind of have to just work with the data that’s available and in a lot of cases blood was the most accessible thing that people could study. So it also just, you know, you’re looking for samples in cohort populations where if you have more individuals then you can detect the associations that you’re looking for better. So blood is a great tool for that as well.

hannah_went:
Sure, sure. Blood is a great surrogate biomarker, right? I’ve heard

danni:
Mm-hmm.

hannah_went:
a lot of researchers say that. And I think that the research will go there eventually when it starts to get really specified based on cell type, based on tissue type. There’s plenty of work going on in that field right now. But again, blood also just being super accessible and what a lot of these disease risks are based on, it acts as a great tool and one that we can use. So talk to me about where you got those samples, Danny, from your study. I know you used two different cohorts. And just while we’re on this subject as well, how did you apply your machine learning methods to create those outcomes of interest in the context of both of these cohort studies?

danni:
Yeah, so cohort studies will typically be available. They’ll measure thousands of individuals who will come for a blood sample. The two cohorts that we were working with in the study in question today was firstly, we used proteins from the German Chora cohort. And we also used proteins from the Lothian Birth cohort 1936, which is a Scottish-based study. And We trained the scores using proteins, like 953 proteins in total from these two studies.

We had two independent cohorts that we then did further testing of the scores in, because we have to keep our training and testing samples separate in machine learning. It’s the golden rule that you absolutely cannot mix. So the way that the model works in a nutshell is you’re going to hundreds of thousands of measurements of DNA methylation across up to, we used in the study about a thousand people across their genomes. So what it will do is it will scan through all of those sites and in your population, it will for each protein select those sites with DNA methylation that are most predictive of the protein level.

And it applies this shrinkage parameter. So you get. the biological relevant sites selected, but in the tightest and neatest package of, okay, these are my sites that are gonna be predictive. And what that means, so each of your methylation sites gets assigned a weighting coefficient, it’s called, and that allows you to then, in another population with DNA methylation, you can use those weightings to project in your score for whatever measure, so in this case, the protein level into that new population and test it.

hannah_went:
Sure. And that is that what you meant by the EPI score? You’re testing those those EPI scores?

danni:
Yeah, so this whole machine learning method, that’s what is used to create the epi score. So you feed in all of your methylation as the inputs, and then you have the protein level as the outputs. And that’s the process that we create the epi score.

hannah_went:
Perfect. And how many, you know, talking about this study that you published, how many epi scores did you find or what did you look at there? You know, if you saw that correlation greater than 0.1, what did that mean?

danni:
Yeah, so when we in the testing populations, the two other cohorts that I mentioned, when we projected the protein epi scores in using the DNA methylation, the reason those cohorts were selected for testing is because they had the measured proteins available. That what we did then was correlate. So all you can do actually, and what we did in the study is you plot.

So every individual is going to have a protein epi score and a protein measure in that population. and you literally plot them on the graph even, or you run a correlation and you say, how similar is an individual’s protein epi score to their protein measurement? We saw, we set quite a low threshold, but if we saw a correlation of greater than 0.1, we felt that there was some aspect of protein biology being captured by the score.

And out of those 903, 53, sorry, different proteins, there were 109 protein epi scores that were selected. We decided to take those forward to look at disease prediction. And I think maybe, you know, this is an interesting point actually, that there was only these 109 as the subset. And it’s possibly because some proteins just don’t have a strong methylation signature. Or it could be that we need larger sample sizes in order to detect some of the relationships between DNA methylation and proteins that exist. But because we only had a thousand individuals in this case, we might not have been able to detect them.

hannah_went:
Sure, sure. So you have about these 900 protein levels of interest you’re looking at with these two different cohorts. You’re then mapping those protein levels with their epi scores or those epigenetic scores.  Those that have that higher correlation are those 109 proteins and you’re pulling them out and you’re looking at how those may affect disease outcomes. Is that like a good summary, no case summary?

danni:
Yes, yes.

hannah_went:
Perfect, perfect. And you talk about… protein quantitative trait loci as well in your paper. So how does that fit into the picture and what is that? What does that mean?

danni:
Yeah, so I said that protein, they call PQTL protein quantitative trait loci, and it’s a very long name for something

hannah_went:
I’m sorry.

danni:
that is actually quite simple. So all that really means is we can identify parts of the DNA, so changes in our genetic code that very closely associate with the regulation of a protein level in the blood. And those had been identified in previous studies and mapped for each of the proteins that we were looking at. So because we wanted to create these scores that were primarily non-genetic and capturing the environmental and lifestyle exposures on the DNA methylation associated with protein levels, we didn’t really want to include those genetic effects on the proteins.

So what we did is just pre-adjust the protein levels to say, We don’t want to worry about the genetic effects that we know regulate this protein. We want to capture the DNA methylation that and the environmental kind of lifestyle and non-genetic signature of that protein. So that’s how they came into the picture.

hannah_went:
Yeah, it sounds super difficult, but very,

danni:
Hehehe

hannah_went:
very simple. You’re removing or accounting for and taking out the PQTLSs, which have genetic factors that are maybe responsible for part of that protein, so you can just focus on the true DNA methylation signature. And how can these EPI scores that you created for protein levels be… be a valuable resource for disease prediction and different risk stratification. I know you integrated these EPI scores through a ton of medical records and thousands of individuals, and you have all of these different connections. So how do you take what we’ve learned from your study and make that applicable?

danni:
Yeah, so the disease prediction part, so we spoke about the training and testing just now in four different cohort populations. And once we had that set of 109 scores that we were like, right, okay, we’re going to test whether these are actually useful in disease prediction, we then projected them into another study. So there was a fifth one called Generation Scotland. And actually Generation Scotland is the study that I’ve worked most with during my PhD.

So it’s very close to my heart. And it’s a study of over 20,000 people living in Scotland. So people were recruited around 2006 to baseline where they did their blood blood sample. And individuals have consented for us to link through to their NHS records across the UK. So we can track through GP and clinical sort of hospital appointments, whether those individuals went on to get certain diseases. And that’s such a valuable resource across the population.

So what we were able to do, because Generation Scotland doesn’t have protein levels, but they do have DNA methylation, we could project in our protein epi scores at that baseline blood sample and then test to see whether they associated with the onset of 12 different diseases that we looked at. And as you said, we projected them in and we found 137 different associations.

So what was really interesting and what kind of emerged for us was a distinct profile in terms of type two diabetes. So not only were the associations for type two diabetes, the strongest in terms of the statistical kind of significance, but also the Episcore signature of type two diabetes. actually replicated previous protein studies looking at protein to diabetes onset. And about two thirds of our EpiScore associations with type 2 diabetes replicated those previous biomarker associations.

So what it really told us was that the protein EpiScore, although they don’t correlate in the test set 100% with the protein, they’re actually capturing a disease relevant signal that is the same as what was carried through the protein in terms of disease prediction. And if you think about like these protein episcores are just based on a collection of methylation sites, that’s actually really cool that you can do that and you can replicate a protein biomarker association for type 2 diabetes.

hannah_went:
Yeah, very, I would say prevalent disease, right,

danni:
Yeah.

hannah_went:
in today’s day and age. So super fascinating and everyone listening probably, I could almost guarantee maybe knows someone with type 2 diabetes, right? So super applicable. And furthermore, it’s amazing what your study revealed over just about 137 connections between those epi scores for proteins and future diagnosis of common adverse health outcomes. You touched on that type two diabetes score, but also things like stroke, depression, Alzheimer’s, dementia, different cancers, some inflammatory conditions such as rheumatoid arthritis and inflammatory bowel disease. So, you know, we talk about inflammation in particular being an important part of, an important factor I should say in the blood. So did you see this coming through in the results? Can you talk about what you found more in that inflammation realm?

danni:
Yeah, so this potentially wasn’t a surprise to us because we know from previous studies that there is tight sort of interrelatedness between inflammation and inflammatory proteins and the epigenome. We just know that it exists and it’s very prevalent. So out of the 953 proteins that we tested, first off the 109 that were selected as protein epi scores, they actually, the proteins that we were training on, had a very, very enriched profile of inflammation as it was to begin with, but we also saw really strong associations for those inflammatory protein epi scores in the study too.

So there’s one epi score that springs to mind. I wouldn’t necessarily say it’s my favorite. There’s always a thing in, I think with biologists, where you always have a favorite protein. or a favourite DNA methylation site on the go. Maybe this is my favourite protein epi score, I don’t know. But complement five, the protein epi score for complement five was actually associated in our study with the onset of five different diseases over the follow-up period of 14 years that we looked at. So I think it was type 2 diabetes, stroke, rheumatoid arthritis, heart disease, and chronic obstructive pulmonary disease.

So those are five major diseases that affect different areas of the body. But our signature of methylation associated with complement five was kind of predictive of all of them. So from a perspective of disease prevention, if you can identify these early signatures of inflammation that are upstream of people going on to develop, you know, all sorts of different diseases, that might be the place where you need to start targeting in order to kind of prevent. the chronic inflammation triggering maybe the onset of different disease pathologies in the body. So this was a really interesting thing that emerged. Obviously inflammation isn’t totally bad. We need inflammation to keep us alive. But if we have too much chronic inflammation, then it’s definitely one of these warning signatures.

hannah_went:
Sure, sure. Taking more of a preventative look so we can mitigate the onset of those diseases, thus extending our health span, having those healthy years more hopefully often than not as we become older chronologically. So I have a couple of follow-up questions to your response. Number one, can you just define complement five? What does it do for us in the body?

danni:
Oh, yeah, sorry, I didn’t I didn’t mention

hannah_went:
It works.

danni:
that I just talked about it was my favourite protein episcore.

hannah_went:
And then that’s my second question. Is that your favorite? Now we need to know.

danni:
So complement five is part of the innate immune system. So it generally circulates around in the blood and acts as a as a response to any kind of foreign invaders or or things that your body needs to fight off basically. Mae’n rhan o’r bwyd yn ymwneud â’r bwyd. Felly mae’n rhan o’r bwyd yn ymwneud â’r bwyd. Ac yn y strydau yma, y lleiaf o’r protein ymwneud â’r blod wedi’i gysylltu gyda’r holl rhan o gysylltiadau iechyd os yw y lleiaf o’r bwyd yn ymwneud â’r bwyd. A efallai yw’r scoriau epi o’r protein fy hun.

hannah_went:
for. I just have to know. Very cool. And maybe something our listeners are a little bit more familiar with. How are these scores, these EPI scores you’re actually creating compare with polygenic risk scores? And I’ll back up even further there. For those who aren’t familiar with polygenic risk scores, could you define that as well?

danni:
Yeah, so a polygenic risk score is a predictive score, kind of like what we’ve been talking about with the EPI scores. But instead of DNA methylation sites, it relies on changes in the actual genetic code itself across different individuals. And it sums those up. So lots of different variations in your genetic code in order to calculate a single genetic risk score for say, diabetes or dementia or whatever it is. it very much works in the same way that these methylation risk scores work, in that we have a collection of DNA methylation sites, or we have a collection of changes across your genome.

But what these scores capture, so if you had a polygenic risk score for, say, type 2 diabetes and a methylation risk score for type 2 diabetes, they capture different components of your disease risk. So one is going to be predominantly the genetic, and then the other one is going to be So actually by integrating these types of scores together, there’s evidence to suggest that that could actually be a really useful way to combine the information that you can kind of get from either source of omics data and that that could sit quite nicely in terms of eventually translating to clinical practice.

So there are ethical considerations that I think have to be thought about and discussed around this because, you know, What we can do is projecting these scores into a population and identify, say, the top 5% of individuals on a distribution and the bottom 5%. And they’re your high and low risk individuals. But just because you fall into that group based on your genetic scores or your epigenetic scores, it doesn’t mean that you’re guaranteed to get a disease. So if you tell someone this information of, oh, you’re in the high risk group, you know, there’s… there’s a lot of stress and anxiety that can come from that. I know I definitely would worry about it. So.

hannah_went:
Oh, yeah, me as well.

danni:
Yeah, so there’s definitely a tailored approach that needs to happen where what you’re comfortable with as an individual needs to come into this, like, you know, because some people just won’t want to know this information, perhaps, and other people will find it really empowering. So it has to be personalised. And that’s the only thing I would say.

hannah_went:
Definitely. I’m the type of person where I want to know everything, but even still knowing it and wanting to know it, if I do have a risk, I mean, that would definitely still cause stress and anxiety. So I think there has to be an approachable way to deal with sensitive information or, like you said, a personalized, tailored approach to how you deal with that info about the person. So we know these EPI scores are going have a relationship with all of these different types of diseases, but what about the importance of these scores in the context of age-related chronic disease, right? You hear longevity, anti-aging, preventative medicine, those are the big buzzwords right now. How did those two correlate or how are they connected?

danni:
Yes, so one of the strongest kind of indicators we have actually of age is the changes to the epigenome. There’s things that are known as biological aging clocks that track your kind of age profile and how fast it’s accelerating on a biological level across the lifespan. We know there’s a deep interconnectivity there and also aging in and of itself is associated with an increased risk of the onset of so many different chronic conditions as we age.

So we want, in terms of our lifespan, is just how long we live, and we know that that’s been increasing, but our health span actually refers to the number of years that we remain healthy and disease-free. What we really want to do is maximize that in individuals. And what the kind of early predictive markers that we’re identifying from the blood, up to 14 years before somebody is diagnosed with the disease are able to do is perhaps stratify who’s going to be most at risk and who might actually benefit from preventative interventions and health monitoring.

So that ultimately could have a knock on effect to prevent people developing these chronic conditions kind of as they age. And the thing with chronic age-related disease is they often don’t exist in isolation. So the example of that always comes into my mind. I mean, you could say in this particular study we’ve been talking about that, Danny, you’re interested in the brain, but you’re going on about type 2 diabetes, right?

hannah_went:
I’m sorry.

danni:
In the instance of type 2 diabetes, even if we can prevent that occurring through these early markers, it’s gonna have a huge knock-on effect on the brain as we grow older. So one of the top risk factors for dementia is actually type 2 diabetes. So there’s this interconnectedness. Yeah, there was a report last year that came out and among sort of lifestyle factors and other things that were the kind of part of the top list, type 2 diabetes was there.

And I think it was the only disease, it was kind of the disease that’s on the pathway to dementia. So it’s super interesting. And I think we have to remember that the body is not existing in isolation. Like the brain. and the rest of your body, there is a relationship between the two. n terms of prevention for me as a neuroscientist, if we can help people to live longer and better, that’s a great outcome. So I think this is why these kind of podcasts are so good and talking about epigenetics because we have a certain amount of control over this, right? So you know that part of your risk is genetically. predetermined and out of your control.

But you also know that that because of epigenetics and this ability for our epigenome to be dynamic, there is a lot that we can control in terms of our health and our well being. So this example earlier of the the smoking study where people stopped smoking, and then we saw a reversal in the DNA methylation, it is quite empowering to me that if we want to, we can make these changes and potentially alter our epigenome as a result to keep us healthy.

hannah_went:
Sure. It’s extremely powerful that you can shift those methylation markers in your favor and you always hear things like, you know, food is medicine, your DNA is not your destiny, all proving back to the point that, again, those epigenetic markers are dynamic.

danni:
Yeah, you actually, if you mentioned, you know, food and stuff, the one thing, I think it’s quite doom and gloom in this field. But the one thing to remember as well is that, like, there are positive associations with DNA methylation changes. So exercising, diet, all of these positive lifestyle traits, they also have associations that have been trapped with DNA methylation changes. So that’s really nice and quite optimistic. Because we often talk about disease. and smoking and death in this field.

hannah_went:
Yes, but you can control some of your outcomes, your epigenetics by simple lifestyle factors. So absolutely. And super interesting point about the diabetes leading to that increased dementia risk. I mean, you imagine everything’s interconnected in the body, right? But I definitely didn’t know there was that really strong correlation there. to know in something that I learned. Very cool. So, Danny, what are the challenges in this field that should be addressed in future work? You kind of gave little hints to them, I think, throughout our discussion today, but there’s a lot of challenges when applying these machine learning methods to a bunch of omic datasets. So could you discuss that, talk about some of those points?

danni:
Yeah, so I, there’s, there’s a long list of things actually that could be improved or, or things to discuss, but I’ve just picked my top ones to mention today. So as a kind of researcher, I feel like we’re always looking at the limitations of what we do and there’s always avenues to improve it or explore it further. So one of the key things in the field of computational genetics that has been really positive over the past five, 10 years is people have woken up to the fact that having diverse data sets is really important in terms of people’s genetic ancestry.

You’ll notice in the study that we’ve been talking about today, all of the populations were either Scottish or German in their ancestry. So this was the data that’s available, but actually there’s huge efforts to try and increase the diversity in data sets that we can use. And I think one of the major things that we need to do is test whether these types of scores can actually translate across different people of different backgrounds. Because you want to get to the point where if you rolled this out in the clinic, you could make sure that whoever walked in, that these scores would be able to, um, to work for everyone. So that’s, that’s a major area, I think. And there are studies in more the genetic risk or field type area where they’ve shown that scores trained on European ancestries.

Although they translate to some extent to non-European ancestries, they actually, the performance is not as good. So that’s definitely an area. There’s also, I think, challenges around capturing the complexity of human experience across the life course. So in the study that we talked about today, we went from individual protein episcores to individual diseases, whereas actually… you know, there’s there’s things like multimorbidity, which is the presence of lots of different diseases with, you know, within one individual. And there’s things like polypharmacy, where you might be taking a whole bunch of different medications.

In the study today we’ve talked about the statistical models did not adjust for that and did not correct for that. So modeling that and trying to embrace the complexity within our statistics is, I think, another major area for sure. So there’s in terms of the machine learning itself as well, the very final thing I’ll say is that we have to come up with ways of dealing with such a large amount of data.

I think data has kind of exploded and we have all these measurements across the epigenome now, hundreds of thousands of sites. And what it means is that this is computationally really intensive to work with. So there are more advanced models like neural networks and random forest. If you shoved in all of these hundreds of thousands of points of methylation, it simply couldn’t run. Like you would just break the model. So that’s

hannah_went:
Yes, not good.

danni:
not good, but there’s feature reduction is an area where people are trying to condense down the information and come up with ways that we can model it slightly cleverer and not have to use, you know, hundreds of thousands of different sites, you know, maybe we can just select down a few and condense the information. So.

hannah_went:
Yeah, all very valid points.

danni:
Yeah.

hannah_went:
I know that everything you noted there is something that our bioinformaticists here at TruDiagnostic talk about all the time. Their computers are constantly running and trying to put all of this data to work. How can people get involved in these large cohort studies? How can people become involved?

danni:
Generation Scotland, the study that I spoke about today, they’re actually recruiting for more individuals. So the study is open to anyone that’s living in Scotland. So it’s inclusive of any kind of genetic ancestry, as I’ve mentioned earlier, that’s a key thing. Anyone over the age of 12 can sign up to the study. If you’re a teenager, you need parental sort of guardian sign off to do so. You just need to log on and you can fill out a questionnaire online.

It takes about 10 minutes. Then actually you get sent through a sample through the post to do a saliva swab. I’ve been talking about blood and we’ve discussed the benefits of blood for this entire episode. Now I’m going to introduce saliva because the reason we’ve gone for saliva in this instance is because we’re trying to make the study, the next waves of Generation Scotland. as accessible as possible. And I know personally, I’d rather get a saliva swab through the post to study my methylation

hannah_went:
I’m sorry.

danni:
than someone putting a needle in my arm because I’m terrified of needles as it is. So

hannah_went:
Hahaha

danni:
we’re hoping that as many people as possible can sign up as a result. We’re really interested in families joining. This  because then we can kind of map the way that genetics is passed down across generations. So if you’re in Scotland and you’re listening, then do… Do you consider signing up? That would be great. We can put a link in the description probably as well with the podcast.

hannah_went:
Absolutely. Yeah, I’ll go ahead and put that in the show notes. So yes, anyone listening in Scotland definitely get involved. This is really how, you know, the research is possible. We couldn’t do it without these large cohort studies. So they’re super important to, you know, Dani’s future work. So we’re getting down to the end here, Dani. You know, what’s next for you? What are you interested in?

danni:
So that’s really what I’m up to at the moment. And I feel really lucky to have had this experience of working with the data that I do. And the whole experience has just been great. So I always have this thing of if you wake up in the morning and you want to open the laptop and analyze data, then it’s probably a good sign that you’re in the right position, right?

hannah_went:
I love that. I can just tell how passionate you are about your work. So my very last question before we wrap everything up, this is more of a curveball question. Danny, if you could be any animal in the world, what would you be and why?

danni:
I know my answer immediately.

hannah_went:
I like that. No one has had that response so far. Most people are like, what?

danni:
So this is a long standing thing. I don’t know if you already have a pre-prepared answer to this question in your

hannah_went:
I

danni:
head

hannah_went:
do.

danni:
at

hannah_went:
No

danni:
all

hannah_went:
one’s

danni:
times.

hannah_went:
asking me questions back to me, but I do.

danni:
So mine is an English badger. And I really, I think they’re so wonderful as creatures. Thre really gentle. They live in the forest. They really like peanut butter apparently for some reason, which is kind of cool. I just think they’re the coolest. One of the coolest animals. They come out of night and they’re kind of gentle unless you provoke them. And then apparently they can get quite violent. But, so that.

hannah_went:
Well, I’m going to Google an English Badger after we wrap up here. I love that you, yeah, knew your answer, super confident. That’s awesome.

danni:
They’re very cute. If you Google English badge are cute, it’ll probably come up with some nice pictures.

hannah_went:
Awesome. Well, you know, Danny, we’ve come to the end of this amazing podcast interview. I can’t thank you enough for your time and giving us some insider information about your work. For the listeners who want to connect with you, where can they find you? And I’ll put all of this in the show notes as well.

danni:
I’m on Twitter and we’ll also put a link to my University of Edinburgh page. and a couple of YouTube videos where I’ve recorded some explanations of the study that we’ve been talking about today and also some other studies that I’ve published as well. So we’ll stick all the links in the description and feel free to get in touch because we’re always looking for collaborations and to talk to people with lots of diverse experiences and get as many. inputs as possible.

hannah_went:
Yeah. Well, very cool. You know, thanks again, Danny to everyone listening. Thank you for joining everything epigenetics podcast. And remember you have control over your epigenetics. So tune in next time to learn more. Thanks so much, Danny.

danni:
Thanks, Hannah. It’s been fantastic.