Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

Assessing the Reproducibility and Integrity of DNA Methylation

Listen or watch on your favorite platforms

The reliability of testing epigenetic DNA methylation using Illumina beadchips is of paramount importance due to the specific intricacies of this technology.

Illumina beadchips are widely used platforms for high-throughput epigenetic analysis, employing thousands of probes to measure DNA methylation levels at specific genomic loci.

In this week’s Everything Epigenetics podcast, Dr. Karen Sugden and I talk about how the reliability of these probes directly impacts the accuracy and validity of the results obtained.

Keep in mind that in the context of Illumina beadchips, reliability refers to the consistent and accurate performance of each individual probe across multiple samples and experimental replicates. Each probe is designed to target a specific CpG site, and the methylation signal it generates must be dependable and reproducible.

We discuss how reliable probes ensure the accuracy of DNA methylation measurements and how the reliability of probes becomes crucial for reproducibility when conducting large-scale studies using Illumina beadchips, such as epigenome-wide association studies (EWAS).

Dr. Sugden and I also discuss how the reliability of probes on Illumina beadchips has implications for cross-study comparisons. For example, if the probes exhibit inconsistent behavior across different experiments or cohorts, it becomes challenging to compare results and draw meaningful insights from combined analyses.

Furthermore, we chat about the efficient utilization of resources being linked to probe reliability. Unreliable probes might necessitate repeating experiments or allocating additional resources to validate results, potentially delaying research progress and increasing costs.

In the context of epigenetic research, where subtle changes in DNA methylation can hold profound biological significance, the accuracy and consistency of data generated by Illumina beadchips are pivotal.

Lastly, we explore Dr. Sugden’s current research which includes how epigenetic clocks are associated with cognitive impairment and dementia and marijuana use.

In this podcast you’ll learn about:

– Dr. Karen Sugden’s career
– Reliability and why it matters
– How unreliability arises in epigenetic research
– The process of measuring DNA methylation on Illumina beadchips (or microarrays)
– Technical errors that could arise when looking at DNA methylation
– Karen’s paper titled “Patterns of Reliability: Assessing the Reproducibility and Integrity of DNA Methylation Measurement”
– How to untangle data from different beadchips (27K vs. 450K vs. EPIC 850K)
– What constitutes a reliable probe vs. an unreliable probe
– How to handle unreliable probes
– Who is at fault for unreliable probes
– If reliability is the same for every beadchip
– How unreliability impacts epigenetic research
– How we can deal with unreliability
– The value of repeated data
– Creating a “gold standard” work flow for processing epigenetic data
– How epigenetic clocks associate with cognitive impairment and dementia
– The connection between epigenetic clocks and marijuana
– Dr. Sugden’s current research investigations


Karen Sugden

Hannah Went (00:01.115)
All right, welcome to the Everything Epigenetics podcast. Dr. Sugden, super excited to have you here today. Thanks for joining.

karen sugden (00:10.151)
Oh, it’s great to be here, thank you. That’s very enthusiastic and I like it.

Hannah Went (00:12.955)
Yeah. Well, you know, we, we haven’t stopped giggling since we, we just started, um, and chatted beforehand, but you know, this is the first time we’ve actually ever had a conversation on online. I’m very familiar with your work, but would love to start off by just learning more about you. So, you know, where’d you get started your career and, and, you know, where you are up until this point.

karen sugden (00:39.25)
I guess we can start right at the very beginning, a long time ago. I’ve always kind of been interested in genetics. So that’s kind of my field of study. I was interested in genetics even as a little kid. My parents wouldn’t like to know that I had a breeding program for gerbils in my bedroom. And I had recordings of all the color combinations that I was getting. So I didn’t even know I liked genetics as a child, but apparently I did.

Hannah Went (00:41.655)
Thanks for watching!

karen sugden (01:09.31)
But actually, I then, when I left school, I actually started working in a pathology lab, which we were doing histology and cytology, which was very, very interesting. But, you know, even then we kind of had a little bit of genetics involved in that. And I kind of got even more interested in it. And I decided that it’s time to go and take this a bit more seriously. So I embarked on a degree in, in genetics.

karen sugden (01:39.57)
that, that I actually got interested in genetics and behaviour. It was actually to do with animals at that point, but you know, I mean, I thought it was so cool that genetics could be related to behaviour. And once I got my degree, I wanted to take that further. And actually I was interested in humans because humans are a lot more complicated than animals. And I thought that would be cool. So I moved to London to the social genetic and developmental

and which is part of King’s College London, to start PhD in genetics of mental health and human behavior. And that’s where I started working with Terry Moffitt and Abshom Caspi. And they introduced me to need and study at that point. First I’d heard of it, but I was just blown away by it. So we did some work together and that kind of never stopped and I’m still doing work with them now. That’s been going on now for a couple of decades,

Hannah Went (02:34.083)

karen sugden (02:39.75)
Very, very cool. As part of that, we actually ended up moving to the US, so we moved from London to Duke University in the US. And a couple of things have actually happened since then. I’ve always been kind of a lab-based person, but the way that research goes has moved more towards using bioinformatics-style tools. So I kind of have workers move more towards genome-wide types of approaches and doing

more bioinformatic work so I do much less work in a lab these days but also as part of working with the Neidan study we’ve sort of watched the members get older. I’ve been with them for quite a long time so now we’ve sort of moved our focus more towards aging as we’re watching them age as we all do. So that’s a couple of things that’s kind of like happened in my work in these last few years.

Hannah Went (03:21.255)
Thanks for watching!

Hannah Went (03:29.976)

Hannah Went (03:35.715)
Yeah, beautiful, beautiful background. I love the story of you in your room when you were a young girl and kind of studying this phenomenon even though you didn’t really know what it is. So it does seem like that interest was there at a very young age. So just super interesting. And then obviously all the work that you’ve done in conjunction with Dr. Caspian Moffitt on that Deneen Dinh cohort and study is absolutely incredible. I love to see how these insights and kind of the questions we’re asking changes over time

karen sugden (03:42.844)
I’m sorry.

karen sugden (03:47.675)

Hannah Went (04:05.675)
maybe the cohorts do as well. So I appreciate you giving that background. Now, a lot of your work, and definitely what we’re gonna focus on today, I haven’t discussed this with anyone, that’s why I was excited to have you on, is going to be on patterns of reliability. And I think that’s obviously very, very important in this epigenetic methylation world, or DNA methylation space. So can you just talk about what is reliability, and why does it matter in the scope of what we’re focusing?

karen sugden (04:36.59)
Sure, of course. So reliability is just simply the extent to which something can be measured more than once and to actually give the same answer when you do it a second time. So this applies to any measure. It applies to anything across all branches of science, engineering, medicine. This is not an epigenetic specific thing. I mean, it even applies to you measuring out stuff when you’re baking a cake. I mean, if you can’t use your measuring cup and give you the same answer, your cake’s

So, you know, I mean, reliability is really important for everyone. So, I mean, to give sort of a very silly example of what it might actually be. Imagine you’re going to measure the height of someone. You grab your tape measure, you stand them up against a wall, you measure them, they’re 5’5″. Great. Okay, they’re 5’5″. And you ask them to go and stand up against the wall again. This time they’re 5’6″. What does that mean? You know, are they 5’5″? Are they 5’6″?

Hannah Went (05:17.118)

karen sugden (05:37.01)
So you can’t rely on that measure. It’s not reliable. You get out a second tape measure, you do it again. They’re 5’5″, both times. You know, you’ve got a lot more confidence that that confidence that that person’s 5’5″. You know, that second tape measure is giving you a lot more validity in their measure. So this is really what, in a nutshell, reliability is. It’s just consistency across repeated measurements. So why does this matter? Well, it matters because it’s fundamental

research. If you can’t get the same value when you measure something twice, you can’t be sure of any findings, any conclusions, or anything that you make using that measure are valid. So if we went back to our height example again, silly example, but imagine you didn’t know your tape measure was unreliable. You didn’t know that it wasn’t giving you the right measures. And then you went and you used it to measure a bunch of people. You got everybody’s heights.

And it turns out you’re interested in what these people like to eat as a snack. And then when you look at everybody’s preference, it turns out everybody over five foot six, well, they like chips. And everybody under five foot six, they like nuts. Great. OK, you take this to a marketer. A marketer says, great, we don’t have to spend money trying to sell our chips to short people. They like nuts. So I mean, this sounds like it is perfect.

Hannah Went (06:55.255)

Hannah Went (06:59.155)



karen sugden (07:06.55)
that your measure is unreliable, it turns out you’ve got a bunch of short people in the tall group, you’ve got a bunch of tall people in the short group, and you know what? That relationship between preference and height just isn’t there. And so, this is, if you extend this and apply this to scientific research, you can see now why this is a problem. You know? I mean, that’s a silly example, but this really is a problem. If we make discoveries based on measures that just aren’t reliable,

Hannah Went (07:17.755)
Thanks for watching!

Hannah Went (07:27.039)

karen sugden (07:36.55)
and you know, it might just be random error or whatever, but it means our conclusions aren’t valid and we’re not able to actually replicate these findings. So, you know, this is not something we want if, for example, we’re trying to find treatments for disease or we’ve got drug interventions or trials or things like that because, you know, say if we were trialling a drug that was based on an incorrect or unreliable measure, then we’re going to put people’s lives at risk, you know? We can harm people. Plus on the other side, you know,

is really, really expensive. It’s really expensive, you know. And if we’re going to spend time and money trying to replicate things that were never really there in the first place, you know, this is not a good way to spend our money and our time. But also, I mean, on the flip side as well, we might end up missing all this important stuff because we can’t find the relationships that actually really are there. You know, we might, we might

Hannah Went (08:22.359)

Hannah Went (08:28.755)

karen sugden (08:36.55)
have an unreliable measurement testing a drug, we might miss the fact that actually there is a treatment option there, and we might just throw it out into the trash. So you know, I mean, reliability is really, really important.

Hannah Went (08:45.055)
Thanks for watching!

Hannah Went (08:50.495)
Yeah, seems like if you don’t have that part of reliability, and I love the example with the height and the chips and nuts, I think that’s good. It’s a waste of time and resources, right? And it’s almost like you’re kind of, yeah, maybe going down those rabbit holes, which are the wrong rabbit holes, or maybe not the correct ones in which you actually think there may be a specific finding. So I just think this is super prevalent. And you had your specific paper, I

karen sugden (09:02.879)

Hannah Went (09:20.355)
2020 titled Patterns of Reliability, Assessing the Reproducibility and Integrity of DNA Methylation Measurement. So that’s the one paper I was like, yes, I need to talk to her about this. So yeah, relating to, I guess, more to epigenetics, how does this unreliability factor arise in epigenetics?

karen sugden (09:30.25)
I’m sorry. I’m sorry.

karen sugden (09:41.73)
Mm-hmm. Well, I think the best way to kind of explain it is to perhaps go right back to basics and think about how we measure epigenetics. So the most popular and the most accessible way that we can assess epigenetics is looking at DNA methylation. And I’m sure all your viewers will be more than familiar with DNA methylation. But the way we commonly do it is we

Hannah Went (09:49.207)

karen sugden (10:12.05)
And they’re just little glass slides, but they have thousands of tiny little short DNA sequences. Actually, I have one. I have one here. See if I can get it up on the screen. Here’s one. Oh. Ha ha ha.

Hannah Went (10:20.357)
Oh my god, yeah, they’re I love how they look they’re like iridescent

karen sugden (10:28.63)
Yeah, yeah, yeah, yeah. So I just have one lying around. So there you go, a treat. So they’re just these little glass slides, and they just have thousands of these really tiny short DNA sequences stuck to them. They sit on beads, and they wave their little tails in the air, just looking for pieces of DNA that might be floating past. So each one of these little tiny strings tags a region in the genome that you’re interested in. So when we’re interested in DNA methylation,

in regions of the genome that we know can be differentially methylated. So there’s really two things about this little DNA sequence. We call them probes, you know, because they basically probe the DNA sequences flying past them. It’s complementary to the DNA that we’re interested in because DNA really likes to stick to its opposite. So when it sees its

karen sugden (11:28.41)
probing for that opposite bit. And also that there’s two versions on these arrays. So we’ve got a bit that will specifically grab methylated bits, and we’ve got a probe that will specifically grab the un-methylated bits. And

One thing where we can find that unreliability might come up is that some probes are actually a bit better at being selective than others. So some probes might not do quite as good a job as grabbing what they should. They might grab slightly too much of the wrong stuff. You know, so we, you know, you’re coming up, this is, this is what we know, technical error. So, you know, I mean, this, this, this happens. So this could be one way in which reliability might pop up.

Hannah Went (12:03.758)

karen sugden (12:13.63)
We are sampling our DNA, whether it’s from blood, saliva, or whatever. We’re actually sampling. We’re taking thousands, if not millions, of cells. And each of these cells has got its own copy of the genome inside, where for any particular point of the genome we’re interested in, it’s either going to be methylated or un-methylated. So when we’re measuring DNA methylation, we’re actually measuring the proportion of methylated genome copies in all the DNA samples

got within that particular sample that we’ve put on the array. And that can range from 0% to 100% methylated, or methylated if you wanted to do it the other way. But we usually talk in terms of methylated. So as I mentioned, I’m bringing this up because it’s just worth remembering that some probes are just a bit better at picking out the right bit than others. Some are really good and some aren’t so good. And that’s why I wanted to just go over the background

Hannah Went (13:03.855)
I’m going to go ahead and turn it off. I’m going to go ahead and turn it off.

karen sugden (13:14.53)
Well, also I just wanted to give a brief background about the microwaves that we’re doing, because this is actually pertinent to what we’ll be talking about in a bit. So they contain hundreds of thousands of probes, each one measuring different parts of the genome. And the most popular ones are the ones that I actually just held up for you. Those are made by a company called Illumina. And their microwaves have gone through a series of improvements over time, where their content

Hannah Went (13:15.096)

karen sugden (13:44.27)
or B-chips as they actually call them, and they’ve actually expanded them and changed them ever so slightly in every iteration. So the first one started out having 28,000 probes, 28,000 regions of the genome, and then research developed and we learned more about DNA methylation and we learned there were more regions that were interesting. So they expanded this to 450,000 probes sites and that was known as the 450k array.

And more recently, they’ve actually developed it again. We now got 850-ish thousand probes, and we call that the Epic Bead Chip. So at each iteration of this, the older chip was kind of like had new content put with it, and it got bigger. And they took out some of the probes that don’t work quite so well that they knew about, so that each bead chip is essentially a bigger version of its previous one. But also, another thing to bear in mind

microarrays is that they’re actually generic, that even though they’re human specific, they’re generic in so much as they’re designed to work for all different types of human tissue. So we, they’re not, the arrays we use, we generally deal in blood DNA. Most people do, it’s a very easily accessible tissue, so people like to use blood. It gives you lots and lots of DNA. It’s great. However, these arrays that we use don’t just target the profile that you might expect,

from a blood DNA sample. They also have to include targets that you might expect from other tissues which won’t always look the same as blood DNA because each tissue will have its own specificity, it’ll have its own pattern, there’s part of the way that tissue gets regulated so that tissue can become what it’s meant to be instead of becoming something it doesn’t. So you know, they have to be generic. So what it means is that when we look at these microarrays and all the data that we get on it, not every single probe is going to be

Hannah Went (15:36.855)

karen sugden (15:43.75)
to our study because if we’ve got blood, we won’t need to worry about probes that aren’t supposed to be different in blood, but they are supposed to be different, let’s say kidney or something like that. So that was a long discussion of how things can actually become unreliable, in so much as you know, set up of what the arrays look like and how probes work. So

Hannah Went (15:49.255)

Hannah Went (16:03.919)

Hannah Went (16:09.095)
Yeah, that’s great. I don’t think anyone has described that entire methylation B-chip array-based technology from Illumina. So that’s great. I do have just one question, if anyone else is wondering it, too. So why don’t you think maybe it’s not Illumina, maybe it’s some other company builds those separate arrays based on tissue type? Is it just a cost factor, probably? I mean, obviously, that’s not cheap or easy to do.

karen sugden (16:30.192)

Yeah, I mean, to be perfect now, she’d have to ask Illumina’s board for that, whether they would be worth doing, or any other company or whatever. I think there are application-specific arrays. Yeah, so I think some possibly exist. I think it’s just because it’s also a field where we can’t find things out until we look

Hannah Went (16:41.437)

Hannah Went (16:49.877)

karen sugden (17:04.33)
them. We kind of want to have everything because you don’t know until you look, right? So if we try and cut it down, we might miss something because we didn’t know when we made the array that that’s actually really as important, it’s just we haven’t spotted it before. So in some ways, it makes sense to have as comprehensive a platform as you can, just in case there is stuff hidden away down that you hadn’t decided to look before. So does that answer your question?

Hannah Went (17:08.719)

Hannah Went (17:29.615)
Yeah, no, that’s a great point. I mean, that makes, you know, we don’t know what we don’t know, right? And then of course, Illumina is coming out. I don’t know what they’re calling it. I think it’s the Epic V2, right? I think it’s gonna be like the next step up where it’s like 1.2 million positions somewhere. Yeah. So that’ll be exciting. But again, we’re getting more access, right? Because what, there’s like 28 million different methylation markers in every single cell type. So we’re, you know, creeping up on that number slowly,

karen sugden (17:34.333)

karen sugden (17:39.15)
Correct. Yep.

karen sugden (17:45.152)
It’s going to be a lot. Ha ha ha.

karen sugden (17:56.033)

karen sugden (17:59.61)
Right, right. Correct.

Hannah Went (17:59.835)
to understand what these markers mean. So yeah, that was a great answer. And yeah, I don’t know if you want to go into more of your paper or you want to talk a little bit about how we deal with these unreliable probes. I know it’s kind of all in our work together. So yeah, we’d love to hear your thoughts there.

karen sugden (18:15.553)

karen sugden (18:18.63)
Yeah, so I mean, I think in a way we, I might just talk about the background, why we decided to look at this, because on the surface it might not seem that exciting, but it was integral to our whole research program. So yeah, yeah. So what actually happened is it was, this paper started as just kind of like an exercise for our team on what we do with data, how we process data, how we deal with data.

Hannah Went (18:34.155)
Please do, please do.

karen sugden (18:49.41)
So just as a bit of background, we were about to collect DNA methylation data from the Dunedin study participants at age 45. Now I think we’ve talked about Dunedin study previously with other people who’ve joined you in this, but briefly it’s a longitudinal study. Everybody born in a town in New Zealand, Dunedin, 1972, 1973, they’ve been followed up progressively

Well, when they were 26 and 38 years old, we were able to collect DNA and we profiled their DNA methylation from DNA that we collected those two ages. We did that using the 450k array, the 450k B-chip that we talked about earlier. So when we came to do the same again at age 45, well, guess what? The 450k B-chip didn’t exist anymore. It had been superseded by the epic.

So this we sat down and we said, okay, so are we going to be okay combining all our data together? They’re different bead shapes. We don’t know. Are we going to be able to do this? Because one of our key aims is to have longitudinal data that we can use over time. But we can’t use longitudinal data over time if it’s not the same data over time. Because you kind of need to make sure that you’re not introducing error. So we said, okay, let’s just go to literature. Let’s go and find out whether we’re all right to do this.

And there was a little bit there, but not so much, telling us that, you know, there is some similarity between the two, but you know, some of it isn’t, some of it is. And we were like, well, okay. So what does that mean for us? Because it’s all right documenting what this similarity is, but if it has consequences for the kind of research we want to do with these data, then we’ve got, you know, issues and we couldn’t really find anything that told us their answer.

Let’s see what happens. So that’s where this paper came from. It came from us wanting to know whether we can use the data in the way that we want to use the data. So we were actually quite lucky because we had some B-chip data, some 450k and Epic B-chip data from twins that are part of another one of our studies, the ERIS study.

karen sugden (21:18.71)
measured twice. So basically we had a reliability study ready to go where we could test the reliability of the 450k array data versus the epic array data. So we got this data, we subset it to just the probes that are shared between those two arrays and that ended up being about 440,000ish I think. And then we just asked well how similar are they across the arrays? In other words how

Hannah Went (21:23.078)

Hannah Went (21:40.455)
Thank you. Bye.

karen sugden (21:48.55)
reliably could the methylation level be determined across these arrays? And actually, we found on average that probe values were pretty unreliable. And in fact, only about 7% of probes, when we looked at them, could be deemed to have excellent reliability. I mean, it’s all a scale, but we have a sort of a standard scale on what we deem excellent. So out of the 440,000, it’s not that many.

Hannah Went (21:49.655)

Hannah Went (21:58.916)

karen sugden (22:19.33)
But it’s actually what we expected, to be honest. And the reason for that, the reason for that was that we already knew that some probes were going to work better than others. I mean, it’s just a fact of any molecular assay, you know, you’re going to have error that creeps in. Some things just don’t work as well in the chemical environment that you’re subjecting to them to as other ones do. It’s just, it’s just part of the course. So we knew that, you know, that’s all was going to happen.

Hannah Went (22:22.682)

Hannah Went (22:44.455)
Thanks for watching. Bye.

karen sugden (22:48.55)
arrays weren’t specific to our blood samples. So we’re going to have a bunch of probes on there that shouldn’t be delivering us any data, we think, we don’t know. But we also knew that a couple of other people had reported in the literature that actually, this is what happens and it looked very similar to the results we were getting. So we weren’t surprised, but it still didn’t answer why should we bother? What are the consequences? Should we even bother?

Hannah Went (23:12.738)

Hannah Went (23:17.375)

karen sugden (23:18.65)
be a problem, you know, who knows.

Hannah Went (23:19.935)
Mm-hmm, right. Yeah, I mean, yeah, that does sound super concerning. But like you said, you were expecting that. And now I think I have more questions for Illumina than anyone else. But that’s kind of what we’re going through right now with the 850k Epic V1 to the V2. It’s like, hey, here are the kind of sites that are going to be missing. And some of those sites are included in algorithms, which we use, which is super concerning.

karen sugden (23:30.674)

karen sugden (23:47.15)

Hannah Went (23:50.135)
It all makes sense. And yeah, I’m just curious as to what the cutoff is, why certain probes are maybe let out. If they’re doing, I’m sure they’re doing internal studies and kind of looking at the comparison. But like you said, considering that the tissue type may be different and then you’re excluding some probes altogether, that accounts for probably a large amount as well. So yeah, super interesting. Do you think you’re going to do the same thing with the Epic V1 versus V2?

another type of reliability study there.

karen sugden (24:23.07)
I think from this point onwards, given what we found in this paper, where we found that reliability really does impact the work that we do, it can have big ramifications, we will always, going forward, always perform a reliability test on any new DNA methylation data that we generate for that reason. And that’s not to denigrate Illuminus products, then the Illuminus products are great.

Hannah Went (24:27.375)

Hannah Went (24:46.895)
Yeah, I think it.

Hannah Went (24:50.902)
Oh no.

karen sugden (24:53.655)
that we want the answer for ourselves.

Hannah Went (24:55.895)
Yeah, I think you have to though, to make sure that your data is reliable. So you can, especially with the Dunedin cohort, how you’ve studied it over 50 plus years, right? You have to make sure that you’re able to compare that data. So no, I think that’s great. And I think that’s something that this field, the researchers are super interested in learning more about and understanding. So what do you do? What do you do with that information, Dr. Sachin? What’s next? How do you take that and say,

karen sugden (25:19.45)
hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahah ahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahah ahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahah ahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahah

Hannah Went (25:25.976)
and you know this way or whatnot.

karen sugden (25:30.09)
So, I mean, there is no set answer to this. So, you know, I can give advice to people when they contact me and ask me what to do about this. I can tell them what my preference is. I mean, it may be right or it may be wrong, depending on who you are. But I mean, the kinds of values that we have generated

Hannah Went (25:35.055)

karen sugden (26:00.15)
study, they’re very similar to those that other people have generated doing similar studies. They’re not identical, but because each study is going to have its own technical variation that impacts measurements ever so slightly. But they’re very similar. So one thing that people could do is they could take all these values that we’ve generated and just say, OK, I’m going to assume that probe number one has a reliability value of 0.8.

what it says on this list and go forth with that and then decide to perhaps just use the probes that meet a certain threshold of whatever they deem to be good reliability. Ideally, what people might want to be doing, and this only really is going to work for people who are generating data, is to actually factor in replicability, factor in repeated measures as

Hannah Went (26:31.94)

Hannah Went (26:42.584)

karen sugden (27:00.11)
just measuring the, say, you know, like 200 samples that you’ve got, measure 220. And then you’ve got for 20 samples, you’ve got a, you know, a pair that you can test reliability on. So basically just do exactly what we did and just take these samples and say how similar are the two values when you run this twice, then you would have a reliability list that’s specific for your sample.

Hannah Went (27:18.976)

Hannah Went (27:27.955)
Yeah, I mean that makes sense. The latter makes sense. It’s just unfortunately a little bit more expensive, but if you had unlimited funds, you might as well. The first point you made about saying, you have all these probes, you look at the reliability, you assign a certain weight to that probe and then make a threshold, this is included, this is not included. Are you talking more about the pre-processing type of analysis maybe in terms of kind of giving those probes different weights?

karen sugden (27:33.775)

karen sugden (27:57.05)
Not necessarily. I mean, if you imagine we’ve got a list of, say, we’ve got 440,000-ish that we have reliability values assigned to, you know, that we’ve determined reliability for essentially. You could literally take that list, say I’ve got a dataset, I’ve got 200 individuals that I measured on the Epic or a 450k array.

Hannah Went (27:58.655)

karen sugden (28:27.11)
with this list and then I’m going to filter out anything that doesn’t reach a threshold that I predetermine, whether that’s 0.5, 0.8, whatever. I mean, it is up to individual researchers to decide what’s appropriate and what’s not. So then you might cut that down to 120,000 probes instead of 400 and something thousand probes because you might kick out a bunch of unreliable stuff. So

Hannah Went (28:34.296)

Hannah Went (28:42.475)

Hannah Went (28:53.055)

karen sugden (28:56.95)
easiest way for a researcher to do it, to be perfectly honest, because it doesn’t really require any any sort of further statistical manipulation of your analysis. Yeah, yeah, yeah.

Hannah Went (29:07.515)
Gotcha, just cross-referencing. Okay, yeah, that makes sense. And this might be the wrong question to ask as well, but those probes you said, they’re grabbing a particular part of the genome, right? Like a specific CPG, is that correct to say? So what if these probes, as you’re going through them, right, you say, oh, this one’s unreliable, but there’s 10 that are in the thinning-didn’t-pace algorithm, right, so how do you account for that?

karen sugden (29:20.45)
Mm-hmm, mm-hmm, yeah.

karen sugden (29:35.21)
Well, the short answer to that is when it comes to Dunedin Pace, we only have reliable probes in the algorithm because one of the consequences of this work that we were doing, and as I say, this came before Dunedin Pace development, of which I was a member of the team that has done that. One of our prerequisites that after we discovered that, you know, unreliability does have some consequences for a number of different facets of research that we’re interested in,

would only use reliable probes that we deemed reliable based on our data when we were developing the algorithm. So even if we applied this filter to our data, we shouldn’t kick out any of our needing paste probes because they were never in the algorithm in the first place.

Hannah Went (30:21.875)
And I’m going to add that to the list of reasons why I love the Dunningian pace algorithm so much, is you don’t even have to deal with that. No, that’s great, Dr. Sugden. And then in terms of who’s at fault here, no one’s really at fault, right? This is just part of using a generic array technology. And guess what? It’s going to be the same for every type of technology. You’re going to run into some type of unreliability, but then create different pathways

karen sugden (30:26.414)

Hannah Went (30:51.915)
to be able to improve and again go through those kind of processes for that particular technology. So in terms of it being the same for every chip or b-chip I’ll say, I’m going to guess it’s going to be different, right? The unreliability or what if you’re going from one 850k chip to another 850k chip? How are the probes compared there?

karen sugden (31:16.99)
So the short answer is, who knows, until we test it. I wish I could be a bit more concrete than that. All I do know is that for the data that we had, the 450 versus the Epic, for the 350 twins that we looked at, these are what the reliability came out to look like. Is it going to be the same for every comparison that you do? It’s probably not. It’s going to be very, very similar, because we’ve seen that against other people’s

Hannah Went (31:20.115)
Yeah. Right, right.

Hannah Went (31:38.476)

karen sugden (31:47.01)
work. However, it’s not going to be identical. And the only way we’re going to find out is actually run those experiments. So, you know, when we have the opportunity to do the comparisons, we’ll do it and we’ll find out the answer at that point.

Hannah Went (31:50.817)

Hannah Went (32:01.675)
Sure, sure. Absolutely. And do you have anything else to add? I know we talked about this a little bit earlier too. Do you have any other reasons as to how unreliability can impact research? Is there anything else you want to add there?

karen sugden (32:15.37)
So I mean, I think epigenetic data is getting used a lot. It’s really popular. We have just acres and tons of data available for public use. It’s great. It’s fabulous. This is one of the goals of generating these data, public availability, get people to use them. What this does mean is that not everybody’s going to be familiar with the actual data itself, the nitty gritty, how it was made

made, what it looks like, or even be familiar with all these factors that we’re talking about today. They’re not going to be familiar that unreliability exists. They’re not going to be familiar that all the probes aren’t necessarily made equal. They’re not going to be familiar with all those things. So I think…

Hannah Went (32:55.555)
Thanks for watching!

karen sugden (33:07.09)
probably going off a little bit off topic now, but I think one of the things that we want to do is we want to try and make sure that this is one of the things that becomes aware to people who aren’t familiar with these data. Because what happens is people start using them and start reporting things, and they’re not aware that it’s not set in stone what this value is, because they’re not aware that some unreliability might be connected to it. I think one example where this is

Hannah Went (33:19.876)

Hannah Went (33:33.119)

karen sugden (33:37.05)
actually kind of highlighted quite well is actually in the paper. And we were interested about association studies, which is like bread and butter for these large cohorts, you know, people love to do association studies. They’re great. And they give you some really interesting insights. So kind of like briefly, all we’re doing when we do an association study is we’re testing for the strength of a relationship between an exposure and an outcome, you know, so maybe the exposure or the outcome in this case, DNA methylation.

Hannah Went (33:46.255)

Hannah Went (34:07.423)

karen sugden (34:08.831)
So you then see whether that reaches any sort of predetermined statistical threshold for significance, and if that strength of association between DNA methylation probe and your outcome is strong, then you’re like, oh, it’s associated, great. So when we do these kinds of tests using epigenetic data of the type that we’re talking about here, we’re doing thousands of association tests. We’re doing like 400,000 or 850,000 of these tests.

And we’re looking to see which of those 850 or 450,000 probes are actually past this threshold. So what we wanted to know, because these are so popular, they’re super popular to run, and you see them everywhere. We wanted to know whether this unreliability had any consequences for this kind of research. So we decided to look at smoking, cigarette smoking. And the reason for that is it’s one of the most strongly replicated.

Hannah Went (34:45.375)

karen sugden (35:07.17)
finding when it comes to DNA methylation. You know, cigarette smoking really does mess up your methylome, so you know, it’s one of those things that you can kind of rely on finding something with. So we went into literature and we found all these genome-wide association tests where they’ve, you know, they’ve taken all these BD array data and looked at cigarette smoking and looked which probes were associated. We found at that time there were about 22, I think it was. And then we took all the probes that they’d reported

Hannah Went (35:15.755)
I’m sorry. I’m sorry.

karen sugden (35:37.15)
being significantly associated with smoking. And all we did is we just counted how many times that probe popped up in a list across all those 22 studies. And it was kind of like a sort of a lazy way to check replicability of a probe. It’s like if a probe is replicable, it’ll keep coming up time and time again. So you should have a high number for that count. So what we found is that when we did that, those probes that had a high number,

Hannah Went (35:56.655)

karen sugden (36:07.05)
again, those are the most reliable probes from our definition. So these are the ones that you could measure reliably time and time again. We could also find regularly when we look for the same thing. And then those probes that didn’t really pop up very often, those were the least reliable. So this is like a huge impact, because if people are reporting findings from association studies, but they’re actually finding results with unreliable probes, they could potentially

Hannah Went (36:09.917)
Mm-hmm. Mm-hmm.

karen sugden (36:37.891)
and it sends people, like you say, down rabbit holes. It sends people looking for things that can’t replicate because they were never really there in the first place.

Hannah Went (36:37.958)


Hannah Went (36:46.835)
Yeah, yeah, and epigenetics, biological age. I mean, it’s thrown in your face every single day. It’s this hot, sexy topic, and very trendy right now. But I always tell people, it’s a trend that needs to stay. We need to measure reliable information over a longitudinal analysis to know what’s really being affected. And I think what we need to go back to, and exactly why I’m having this conversation with you today, is about education.

karen sugden (36:50.23)
Right. Right. Right.

karen sugden (37:07.071)

Hannah Went (37:17.975)
little I think we even know about the methylome alone, let alone people doing all of these studies. We just really need to kind of have the community come together and work on this and talk about these different issues. So, no, I mean, that makes perfect sense. And I think as I’m going through now, reading literature and interviewing people, I’m going to have this in the back of my mind, kind of looking to see if they mention reliability or have a paragraph about that in their methods or whatever section of the vapor.

No, this is great, Dr. Sugden. Now, you did mention the two ways that we can actually deal with that unreliability. Anything else you can think of, or again, like going through papers and making sure we’re cross-checking?

karen sugden (38:02.01)
Yeah, I mean…

karen sugden (38:05.95)
The repeated measures on the actual data, especially, I think it’s the most important thing. We are in a huge data production period. We’re in a huge data production era. We’re being encouraged by grant funding agencies to create these data, share them, get them used, and that’s great because they’re expensive, but they’re valuable.

Hannah Went (38:08.518)

karen sugden (38:37.751)
What would be great is if we also had some sort of prerequisite for doing some sort of this reliability checking on all these data. And don’t get me wrong, people are, you know, I mean, I’m not trying to suggest nobody’s doing this. People do do this. But if we made it standard, you know, we wouldn’t we wouldn’t have to write these papers. You know, it would already be inherent in everything. So, you know, I think that that is the gold standard.

Hannah Went (38:48.56)


Hannah Went (38:57.216)
Yeah, yeah.

karen sugden (39:06.41)
with these metrics that is specific for that particular sample set. I mean, it is expensive, it is time consuming, and it’s difficult to do when you do longitudinal measures because you kind of have to do it every time you collect new data, which gets very time consuming and expensive. It’s important, but it is a time suck. So that would be the gold standard, yeah. But I think…

Hannah Went (39:11.455)


Hannah Went (39:23.631)

Hannah Went (39:29.855)
Well, definitely. Yeah, yeah.

karen sugden (39:35.59)
Sorry, go on, go ahead. No, I mean, in lieu of all I was going to say in lieu of that, just some recognition, I think, that this exists. I think this is one of the hardest things, getting the message out, because it’s kind of like people who’ve worked with these data for a long time already inherently know it. You know, they may not have tested it, but they know that this is a thing. You know, I mean, it’s not, nobody talks about it because everybody knows it. But then the people who don’t use these data from,

Hannah Went (39:36.056)
Oh no, go ahead. I, uh, go ahead.

Hannah Went (39:54.376)


karen sugden (40:05.69)
to research probably aren’t as familiar with this being a concept. So, you know, they go into this naively thinking that a probe is a probe is a probe and it’s not necessarily the case. So it would be, you know, one of the easiest ways to get around this would be for a request that whenever somebody publishes, maybe an association study, for example, with specific DNA methylation probes.

Hannah Went (40:12.696)

Hannah Went (40:29.699)

karen sugden (40:35.65)
that they put the reliability metric right next to the probe so that we can say, oh, do we believe this or not? You know, it might just cut down on some of that, like, chasing the people have to do.

Hannah Went (40:44.715)
Yeah, yeah. Now I love that, I think you’re exactly right, it should be standard, you know, and then you’re wondering if the people who are editing these journals know about the reliability, right? So I think it’s just again about education, you know, the people who are already in the field hopefully educating those who are just now entering the field and maybe trying to learn a little bit more. So this has been a beautiful conversation, I’m so excited that we brought light to it.

karen sugden (40:56.497)

karen sugden (41:07.278)

Hannah Went (41:14.875)
I have a couple studies I want to ask you about that you’ve published and you’ve worked on. So we’ve been talking about the Dunedin longitudinal study. Huge fans of everything that you’ve done for that. But you do have a paper titled Association of the Pace of Aging Measured by Blood-Based DNA Methylation with Age-Related Cognitive Impairment and Dementia. And I can picture the graph from this paper. I know exactly what it looks like and what this paper tells us.

karen sugden (41:37.9)

Hannah Went (41:44.675)
Can you describe what you did and what you found here?

karen sugden (41:47.85)
Sure. So in this paper we set out, we were interested in assessing the extent of association between the epigenetic clocks. Again, I’m sure your viewers are more than familiar with epigenetic clocks. In this case, we were interested in looking at the Horvath clock, the Hannam clock, the phenome age, the Grimm age, and the Deneiden pace. So we were interested in what they looked like when we associated them with various measures of cognitive impairment.

like asking them screening tests that people take when they are being assessed for dementia, and cognitive tests where you, for instance, have to recall a certain number of words after a certain length of time. This is in a bunch of older people that it’s not actually a Dunedin study at all, but it’s in a cohort called ADNI, which is the Alzheimer’s Disease Neuroimaging Initiative.

karen sugden (42:48.21)
monitoring their older age people and they’re interested in watching them as they move towards having a greater risk for dementia. So we were fortunate, we got access to this resource, so there’s all these cognitive measures and diagnoses of dementia, cognitive impairment and things like that. But they also have DNA methylation data, so that’s why we were able to calculate these clocks for the individuals in this.

wanted to answer one specific question, which is, is faster ageing related to greater cognitive impairment? Because, you know, this is something that we really needed to answer. People will want to know this. We want to know, can we take measures of faster ageing, some of which are developed in younger people, and apply them to a group of older people who are experiencing

Hannah Went (43:48.413)

karen sugden (43:48.43)
So no matter what test we looked at, whether we looked at those, whether we looked at a clinical diagnosis of dementia or mal-cognitive impairment, whether we looked at those screening tests or whether we looked at those cognitive tests where, you know, like the word recall and things like that, those who were aging the fastest, as measured by the need and pace specifically, these were the guys who were scoring the poorest in all those domains.

Hannah Went (44:16.897)

karen sugden (44:19.15)
So why is this important? Well, what it does mean is that we can use information from epigenetic pace of aging measures like genetic pace and track cognitive impairment, potentially before it even starts. And this is one of the goals of intervention work is to have some of that tool that we can you can use

Hannah Went (44:36.176)

karen sugden (44:47.93)
to intervene on. So, you know, this is why this was actually very exciting that we can, you know, we can measure things in older people related to cognitive impairment.

Hannah Went (45:00.835)
Sure, yeah, I think that’s fascinating. Yeah, I have an AP034 variant, very interested in Alzheimer’s, my grandma dementia, Alzheimer’s definitely runs in the family. So I think it’s something people are very familiar with and they can definitely relate to it as well. So I just had to ask you about that one because as I was going through your papers, that one caught my attention. And for those listeners who are like, show me everything, I wanna read everything, I’ll link everything in all of the studies,

karen sugden (45:02.475)

karen sugden (45:12.533)

Hannah Went (45:30.795)
can go ahead and look more into your work. Another study, haven’t talked about this yet on the show, but more recently you published one. This is titled Lifetime Marijuana Use and Epigenetic Age Acceleration, a 17-year perspective examination. So tell us about this one, Dr. Sugden. Haven’t talked about marijuana on this show quite yet, so I think people would be interested here.

karen sugden (45:55.05)
Sure. Well, this paper is, we published this, my wonderful colleagues at the University of Virginia, they’re great. And it kind of seems a bit timely, you know, marijuana has been in the news quite a bit recently, so kind of interesting that the timing on this one. But what we actually did for this is we were looking at the effect of marijuana

Hannah Went (46:07.897)

Thanks for watching!

karen sugden (46:25.25)
and actually the precursors is a need in PACE, a need in POAM, because we’ve been at this work for quite a bit. And we were looking in a group of middle-aged individuals that Joe Allen, who is the first author on this paper, has been tracking for an awful long time. So we know a lot about them. Now, there was a number of really interesting things that we found in this paper.

karen sugden (46:55.61)
Marijuana use, lifetime marijuana use, increases epigenetic aging. So those people who are using marijuana are aging faster than those who aren’t using it so much. Which, you know, that’s an interesting point, number one. And that’s actually true even when you take into account a person’s own cigarette smoking. So if you kind of set everybody’s cigarette smoking value to zero, you still see that association.

Hannah Went (47:05.455)
Thanks for watching!

karen sugden (47:25.29)
this association because oftentimes smoking will be associated with faster ageing as I think everybody common sense will tell you it probably is. So second we found that there’s this dose-response relationship between faster ageing and marijuana use. So the more you use the faster you’re ageing. So you know you can see this dose-response moves up over time. The

Hannah Went (47:29.18)

Hannah Went (47:41.255)

karen sugden (47:55.65)
was that the fastest ageing was observed in those people who’d used it most recently. So it’s like this proximal effect, so that you age faster as you’ve just used it, as opposed to people who haven’t used it for ages. I don’t know what that means, whether it’s some sort of recovery effect. We can’t speculate on that from these data, but that’s an interesting observation. The fourth observation was, and I think very interesting,

Hannah Went (48:12.581)

Hannah Went (48:19.955)
Thanks for watching!

karen sugden (48:25.05)
into account the DNA methylation profile of one particular probe, when we’re looking at this relationship between faster aging and marijuana use, it kind of, if you make that set that a zero for across everybody, so make everybody equal for that, this relationship actually breaks down, which I think is fascinating because it kind of suggests, now this particular

Hannah Went (48:30.175)

karen sugden (48:55.09)
cigarette smoking, tobacco smoking. It’s actually the one that would replicate 22 times in our reliability study because whenever anybody looks at tobacco smoking, they find this probe. And biologically, it’s related to hydrocarbons that would be inhaled normally and smoking. So it kind of suggests, this finding kind of suggests, that some of that relationship between faster aging and marijuana use has been mediated

Hannah Went (48:56.555)

Hannah Went (49:05.757)

Hannah Went (49:11.155)

karen sugden (49:25.27)
some sort of inhalation of perhaps hydrocarbons or something else. Now, what was really interesting was this effect was even seen in people who didn’t smoke, said they’d never touched a cigarette. So this isn’t anything to do with them perhaps smoking cigarettes and marijuana, because even people who don’t smoke cigarettes were seeing this. So it’s kind of an interesting suggestion. It’s only a suggestion that perhaps this effect between faster aging and marijuana

Hannah Went (49:45.419)
Mm-hmm. Mm-hmm.

karen sugden (49:57.235)
So a kind of interesting, threw up a few interesting avenues for future research, I think.

Hannah Went (49:58.055)

Hannah Went (50:02.035)
Yeah, yeah, definitely. I don’t know if like secondhand smoke or like anything would maybe have to do with that. But yeah, I know there’s a lot of questions and a lot of studies done in obviously, you know, smoking and epigenetic methylation markers. So I appreciate you giving a review of those two papers there. We’re almost at the end here. A couple more questions for you, Dr. Sugden. What are you currently studying? What’s, you know, what’s your time being spent on currently?

karen sugden (50:06.75)

karen sugden (50:33.15)
Well, at the moment, currently, we have a paper in review that is looking at the link between education and genetics and pace of ageing. And we’re super excited about it. So I really love this paper. So the premise here is that we know there’s a link. It’s long been known there’s a link between education and health. People who get more education generally have better health outcomes, and people who don’t,

or even cross-sectionally, they just seem to have better health. There’s also a pretty substantial amount of literature around the genetics of educational attainment, and in particular, polygenic scores of education. Now, a polygenic score, I don’t know, you may or may not have heard of this term, but it’s essentially where you look at a person’s genome, you take all the parts that have been associated with,

case it would be education. And then you just add them all up. Okay. So it’s just like a sum of how much education related genetics you have in your genome. And this is, this is a genetic variation level, not an epigenetic variation level. So then each person will get their own score based on how many education associated genetic variants they actually have. And these scores do predict

karen sugden (52:03.19)
This has caused some concern for some people because it sounds a little bit like genetic determination, you know, but it, you know, it’s a very active field of study. Now, however, in this paper, what we were interested in, what we wanted to know was, does this positive effect of education on health? So in this paper, it’s specifically looking at the pace of ageing as measured by genetic pace.

Hannah Went (52:05.755)

Hannah Went (52:09.118)

karen sugden (52:33.15)
of this education polygenic score. So in other words, can people with genetic risk for lower education still benefit from getting an education, you know, get those benefits of education that lead to slower aging? And the answer is yes. And that’s really great news. So the message is education is good for all, despite, in spite of your genetics, doesn’t matter. Well, it doesn’t matter.

Hannah Went (52:43.455)
Mm. Mm-hmm. Mm-hmm.

Hannah Went (53:03.618)

karen sugden (53:05.918)
But education is good, which I think is a fairly positive message.

Hannah Went (53:09.955)
I think so too. I think that’s great to end on. And yeah, I will make sure to look out for that study. So yeah, and just keeping up with all of the great work that you’re doing. This next question is one, that’s a curve ball. I always say, I don’t put it on the agenda, but Dr. Sugden, if you could be any animal in the world, what would you be and why?

karen sugden (53:12.771)

karen sugden (53:28.154)

karen sugden (53:35.611)
Oh my goodness. So I don’t think I’ve met an animal that I don’t like yet, which is slightly a problem for me and for anyone else associated with me. But I think it would be so cool to be a tardigrade, you know, the little water bears. Yeah, they’re little bugs that float around, but they’re basically they’re indestructible.

Hannah Went (53:40.266)

Hannah Went (53:56.255)
I don’t know if I know those.

Hannah Went (54:04.075)
Indestructible bug. That’s what you want to be.

karen sugden (54:04.251)
They’re so cool! Oh no, that doesn’t come across very well, does it?

Hannah Went (54:11.575)
No, I like it. I like it. You’re strong. You’re water. Are they the ones that walk on water?

karen sugden (54:19.31)
No, I think they still float around. I know that when they were in one of the Marvel movies, they were in there with Ant-Man, I think. I have not, no. There you go. As long as it’s tardigrades on it, I’ll be fine.

Hannah Went (54:26.315)
Well, then I love Ant-Man. Have you seen the new one?

Okay, well you should go watch it because it’s really good. There you go. Well, we’ve come to the end of this amazing podcast. For listeners who want to connect or see your work, where can they find you?

karen sugden (54:47.091)
Email is a great way. So just my name, Karen.Sugden at Drop me a line over anything. I will try and get back to you as soon as I can. And other than that, you can go to the Moffitt Caspi website, which is, I believe, or just Google Moffitt Caspi, you’ll find them, no problem. And you’ll find not only me, but all my lovely team members

Hannah Went (55:07.855)

Hannah Went (55:11.555)
Thank you. Thank you.

karen sugden (55:15.57)
closely with and I’ll overview of all our research that we do as a team.

Hannah Went (55:20.035)
Definitely. Yeah, I’ll put that out there for everyone to view. And yeah, you know what? Thank you so much everyone for joining and listening here at everything epigenetics podcast Remember you have control over your epigenetics. So tune in next time to learn more. Thanks so much. Dr. Sugden

karen sugden (55:36.091)
Thank you.


About this Guest Expert

Dr. Karen Sugden
Karen Sugden, PhD, specializes in molecular and bioinformatics approaches to understand biological mechanisms related to behavior and aging, with a focus on epigenetics and its role in quantifying the impact of life experiences on aging.

More About me

Everything epigenetic
Everything epigenetic
Assessing the Reproducibility and Integrity of DNA Methylation

More Episodes