Hank: Hello, this is Hank Green, for a special episode of SciShow where we're gonna be interviewing Carl Zimmer, who is a science writer of great renown. Uh, We're really excited to have him here. And he has been working on a project where he got his entire genome sequenced - which is different from what you might hear about with 23andMe where you spit into a tube. That is a more limited version of what Carl has done - not only getting his entire genome sequenced but also getting it delivered to him whole so that he could have it for himself, and he could have himself on his hard drive, and then working with a bunch of scientists to tell him exactly what it all means (well I mean not exactly) - some of what it all means. So I'm really pleased to have Carl Zimmer, winner of the 2016 Stephen J. Gould Prize, congratulations on that by the way. Hello Carl!
Hank: Uh, What's it like to win the Stephen J. Gould Prize?
Carl: Oh, it's a, that's a very big honor. Yeah, I mean, I grew up reading Stephen J. Gould essays, so being able to get an award in his name means a lot to me as - now that I'm a science writer.
Hank: Yeah. That's fantastic. Congratulations. Um, what is it like to get you delivered to you on a hard drive?
Carl: Um, It's - it's pretty disconcerting. I mean, you know, I like literally have this thing on my desk, just - that's it. That is - that is me. That is my genome, and you know, it showed up one day, and plugged it into my computer and we were off to the races. So it's uh- [words spoken over by Hank]
Hank: I mean, they delivered it to you on like a monogrammed, embossed, hard drive. That's a beautiful thing. It should have had your name on it.
Carl: Yeah. It should have, it should have. But, uh, I stuck it, I stuck my name on it with a little label, I don't know if you can see it, and uh.
Hank: Okay, good.
Carl: Yeah, anyway. The reason they have to do this is it's like 70 gigabytes of data, so, uh, they couldn't just, you know, send me an e-mail or something like that.
Hank: Is a, is a genome about 70 gigabytes, or does that include some extra information?
Carl: Yeah, so this is actually a lot more than a, than a genome sequence itself. That's because the way that a company like Illumina, which sequenced my DNA, the way they get at your genome is they actually, um, make lots of copies of fragments of your DNA, so you know, thirty times over, really, and then, um, what you can then do with all those fragments is you can kind of stack them up and try to figure out from that what at each point your genome is. So your genome is only three - three and a half billion base pairs, which you could fit on a much smaller file, if you wanted. So this is really just a raw-
Hank: So they got you everything. And the reason they do that fragmentation is just so that they can do the smaller segments faster?
Carl: Yeah, I mean, this particular way of doing DNA is kind of like parallel processing, you know. You take, you know, billions of little fragments, each 300 bases long, and you can read each one all at the same time on a little slide. And so you can get it done pretty quickly. Um, you may make some mistakes along the way, but because you're making so many copies, there's sort of an error correction built into it. There are a lot of new methods coming online that, you know, try to read in longer fragments, and those may be more accurate and they may be able to read things that Illumina can't, but, you know, it's a work in progress.
Hank: Yeah, I mean, it's, it's remarkable that it's accessible at all.
Hank: I mean, we're talking- how long ago was it that it cost- like, the first human genome was sequenced and it was in the billions of dollars.
Carl: That's right, yeah, so once upon a time there was just one human genome, and it took hundreds of people many years to read it, and it wasn't even a very good version; there were a lot of errors in it, and it cost maybe around 3 billion bucks.
Carl: Then a few years later, in the early 2000's, Craig Venter got his own genome sequenced, and I believe that was in the neighborhood of 100 million dollars? Which, you know, that's down a lot but that's still- a little steep. Uh, but now we're down to the few thousands of dollars, you know, in some cases maybe just a thousand dollars, some companies are saying. It's crashed.
Hank: So, how much did it cost to get- did you pay, by the way? Like how did you uh, how did you cash in on this opportunity?
Carl: So, the way that this came about was that this company Illumina runs meetings called "Understanding Your Genome". Which are really sort of scientific conferences, but um people who go to the meetings, if they wanna pay extra, can get their genome sequenced. Well, I went to my editor [???] and said, "Okay, here's an expense I'd like to file." (laughter) "You know, roughly 3 thousand dollars, I'll get my genome sequenced, and I'll write the hell out of it, and uh, it'll be worth your while. I promise." Um, but the thing was that, you know, what I knew was that I was gonna have to pay a little bit extra, not just to get it sequenced, but to then actually have them convert it onto a hard drive and send that to me. That was like an extra step, and actually not a step that was very easy at all to get done. So I don't know how many people can kinda wave around a genome on a hard drive, um, it definitely took me a lot of uh, a lot of shenanigans to get it.
Hank: (laughs) Uh, so, and then of course it's not just getting the genome.
Hank: Uh, it's just a bunch of letters and numbers, I imagine, that don't mean a lot just to look at. At least at first, how did you go about getting help deciphering what has- what is going on inside of you?
Carl: Yeah, I mean the fact is that if I unlock that disk, and if I, you know, look at the raw data, with you know some browser tools, really it's like a horrible spreadsheet. I mean it's just, you know you have like a 300 letter long piece of gibberish and then a littel note about where Illumina thinks it is in my genome. And then the next one, and the next and the next one, for like, over a billion lines.
Carl: So, yeah. I mean, that's not really gonna say much to me. Um, so I started getting in touch with scientists and saying, "I'm working on this project and I would like to write about what it's like to study genomes and what you can learn from genomes by having you help me understand mine." And so I went to places like the Broad Institute, or Yale, or Cornell Weill Medical, and people would jsut show me what they do. They'd say "okay, let's take your data, here's what we do first. First we uh, we're gonna use our own methods to figure out where in the genome all these fragments belong. Then we're gonna look for special kinds of mistakes. Um, then we're going to- you know 'cause the problem is that we're all a little bit different. And so if you come across a particular frament, then you don't necessarily know where it belongs in your genome. So, they use all sorts of really clever, almost cryptographic, cryptological techniques to uh, to figure this all out. And only then can they actually analyze it. So it's an amazing process.
Hank: And uh, first question here. Give a guess: so it's thirty-- three thousand dollars-ish, 3600 dollars to get your genome sequenced.
But give a guess to if you had to have paid all of the scientists, all of the people who put time into this, their market rate. The billable hours that went into this project. How much do you think this level of analysis would cost if it was just a rich person trying to learn everything about their genome?
C: Well, I would hope that the scientists would charge an arm and a leg for this sort of stuff, because this, you know, these are amazing insights that they were able to provide and they were using tools that they had just invented, you know, like, you know, they'll say like, oh, here's some software that I just put together recently, let's use it on your genome, so you know, I--who knows, I haven't done the math, a hundred thousand dollars, I'd guess, I dunno, I mean, I know that like, I ended up like, with a couple dozen scientists say, hey, this is cool, yeah, I'll definitely help you with it.
H: Couple dozen?
C: And um, and a lot of times what that meant was that you know, we got them the data on their computer, they fired up their programs and they said, okay, let's talk in two weeks because their computer was just gonna grind away at it for two weeks.
H: Yeah, yeah.
C: And you know, and then they could say, like, aha! Here is your list of Neanderthal genes or aha! Here is the list of pieces of DNA that are duplicated in the genome, or aha! Here are the like, 50 genes that, in your genome, that are broken, that just don't work.
C: It takes a long time to grind through all that data.
H: What are the weird, fascinating things you found out about yourself?
C: One, you know, one thing that's interesting is that we're always really nervous about genome sequencing and it's supposed to be this great, horrible terror, you know, because we're gonna find out something awful.
And I'm not denying that there, you know, people can have some pretty scary mutations, there's no doubt about it, but it's easy to forget that a lot of the, you know, most high profile mutations are diseases like Huntington's Disease, are not that common, and so chances are, if you go in and, you know, ask to find out if you have, say, Huntington's Disease, you will not have Huntington's Disease. That's just the nature of the beast. So, on the other hand, you may actually find that you have a mutation that actually protects you from diseases and this is actually a new area of research which scientists are trying to find examples of mutations that actually are, you know, good for you, and so, you know, we don't necessarily all have them, but I turned out to have one, and that was really interesting. I have a mutation on a gene for a protein that sits on immune cells and it actually makes me much less likely to get Crohn's Disease or certain other autoimmune disorders, and so, you know, I was--it was interesting to sort of learn about, well, how does that work? And so it turns out that this mutation means that I sort of, I sort of down dial down my immune systems so that I don't get into kind of runaway feedback loops and get a lot of inflammation.
C: And little did I know that this had all been worked out a few years ago, and that some drug companies took that basic biology and turned it into a drug for conditions like psoriasis, which are now just coming out on the market and are making lots of money. I didn't get any of that money, but that's okay. I didn't even know I had this protection until, you know, last month basically.
H: That's great.
C: So, the, you know, those sorts of unexpected surprises come up left and right.
H: So what's the difference between, like, if I was gonna pay to get my genome, you know, to get a little bit of information about my genome for--with 23andme versus what you did?
This is a very different process, but can you talk a little bit about that?
C: Sure, yeah, so 23andme, what they will do is they will take your spit that they get in the mail and they will pull out the DNA and then they will actually just put together a file for you with information about certain genetic markers. So in other words, they'll kind of take a sort of a survey across your genome, they're not gonna give you your whole genome, but they can zero in on, say, a million different location and say, okay, you have this particular letter in your genome here, and so you know, the human genome, each of our genomes is over 3 billion base pairs long, this is--I think there're up to a million or so of these markers, so you're talking about, you know, a fraction of a tenth of one percent of the genome is what you're getting from 23andme. You can still learn a whole lot from that, you can learn about, you know, potentially if you're a carrier for certain diseases, you can use some of that variation to get an idea about your ancestry. There's a lot that you can do with that, and I don't mean to diminish that at all. The thing is though, that there's, you know, literally hundreds or thousands of times more information if you get your whole genome sequenced, and that takes--that's a totally different process and the way you handle that data is a totally different process, and so you--and then you can address big questions that you can't if you just go to 23andme. You can see, you know, do you have gigantic chunks of DNA missing from your genome?
C: 23andme won't be able to tell you that.
H: So is this something that is happening--I mean, obviously it's something that is happening more, why are people, like, what are the circumstances under which people are getting their whole genome sequenced right now?
C: So most of the time, people are getting whole genome sequencing just purely for research.
H: Do they just like, reach out to a bunch of people and they're like, we wanna pull from a bunch of different populations, or is it people who have specific diseases or people who like, ask to be a part of a process or--?
C: There are a lot of programs going on right now where people are basically donating their DNA to scientific research.
C: So researchers--so Craig Venter, for example, has a company called Human Longevity and they just published a paper where they did whole genome sequencing on ten thousand people at once. Like, it's an amazing study, huge amounts of data, very high quality, and basically what they wanted to do is they just wanted to like, compare 10,000 people's genomes and see what kind of patterns they could find, like, you know, where in the genome do people tend to like, have more mutations than others? It's much rarer for people to get their genome sequence just for some sort of medical reason. I mean, it's still kind of an extreme kind of thing to do and, you know, so really, like, people will only get their genome sequenced once they've exhausted every other avenue.
C: Because it's still expensive and it still takes a lot of work to understand all that data, you know, you get three billion base pairs and then you're left to figure out, well, which one of these mutations that I have actually matters, you know? Which one is making me sick. That is hard to do. So, I'd s--my guess would be like, it's still in the thousands, in terms of people who have been--had their whole genome sequenced for some sort of--to answer some sort of medical question.
H: And then the rest of it is just like, pulling from the population to just increase our overall knowledge of where we're at, overall knowledge of what our bodies are made of and like, you know, the variation and where mutations happen and--
C: Another thing that people will do is they will--they will get a whole bunch of people to volunteer to have their genome sequenced, and then they will look at medical information. Sometimes people, you can sort of tie peoples' genomes to their medical records and then you can start to ask questions like, well, you know, do these group of people, you know, do these ten thousand people, all of whom have diabetes, I mean, are they--is there anything that they share in common in their genome that you don't find in 10,000 people who've never developed diabetes, and if you can look at the whole genome in all those people and if you can get that many people, that's incredibly powerful, and you can start to discover mutations that play a role, and we just didn't know about it before.
H: So I have a weird concern and it's that we're gonna get a lot of good data on the genomes of people who live in the developed world, in America and Europe and Australia, and we will be left thinking that this is a representative sample when in fact we've only sampled kind of like 10% of the world. Is that, like, a legitimate thought to have?
C: It's definitely something that needs to be avoided at all costs, because if you just have a bunch of genomes of white people and maybe just a bunch of genomes of rich white people, then you have a really unrepresentative sample of human diversity, and you're not gonna get to the bottom of a lot of disease.
You know, there's a lot of genetic variation all over the world and you're not gonna understand it unless you really are sampling lots and lots of people, and you know, it's really kind of shocking that even just a few years ago, something like 95% or 96% of all these what are called genome-wide assocation studies, they're trying to look at the genome and tie mutations to diseases, were done on people of European descent. That's just ridiculous, and so there's been a big move among a number of scientists to get a real global representation of human genomes, you know, and there are places like, for example, in New Guinea, where scientists have been particularly interested in going there and just going from village to village and trying to sequence genomes, because people in New Guinea, they showed up there maybe 40, 50,000 years ago, and have been there ever since and so they have a kind of genetic variation there that you don't find anywhere else, and if you find, for example, that you have some really ususual mutation that is linked with a particular disease, that could tell you a lot about the disease in general.
C: So the more genomes from people not like me, the better.
H: And it can tell us more than medical information, too. We're also starting to use and have been using, you know, genome studies to learn more about the history of humanity, about where we've been, what we've done, how much sex we had with Neanderthals, all these wonderful things. Did you find out anything about your, not just your recent heritage, but your sort of historical heritage?
C: Yeah, yeah, I mean, I'm really fascinated with human history, and you know, our genomes carry lots of information about that passed and you know, we've sort of accumulated it and our ancestors accumulated it over literally millions of years, so you know, you can see for example things that you share in common with chimpanzees 'cause we have a common ancestor that lived seven million years ago or so, you know, Neanderthal DNA is really fascinating because it looks as if, you know, maybe between 100,000 and 50,000 years ago, something like that, humans and Neanderthals interbred numerous times and some of their DNA ended up in our genome, so this seems to have happened after humans expanded out of Africa and that means that non-Africans all have maybe a couple percent Neanderthal DNA, so a place like 23andme will actually tell you, like, your precise percentage of Neanderthal DNA, which is fascinating but I would, you know, I would wonder though, what part is Neanderthal. My Neanderthal genes are probably different than yours, because they've just been lost over tens of thousands of years and each of us holds on to a different set, so I was able to go to some scientists who said, like, well, here you go, here are your Neanderthal genes, 613 of them, a lot of them we don't know what they do, but some of them have actually been tied to medical conditions so I'm actually protected slightly from depression by my Neanderthal genes. I'm also slightly at risk of getting nosebleeds because of my Neanderthal genes. Sometimes these things are just--you just sort of think, like, what does that really mean? But for scientists, like, they actually like, there are sort of deep evolutionary insights that you can get.
For example, a lot of my Neanderthal genes turned out to be, like other people, having to do with the immune system, so it may be that Neanderthals had immune systems that were equipped to deal with all of the infections that they faced out in Europe and Asia and when, you know, our African ancestors came out of Africa, they just didn't have that equipment, so if there are these Neanderthal genes floating around the human gene pool that give you protection to these pathogens, that's gonna mean you're more likely to survive. Those genes may have hung around and a lot of the other ones didn't.
H: Fascinating. So cool. Um, so you are writing a whole series on your experience with your genome and parsing it and the scientists who worked with figuring out, you know, health, you know, recent ancestry, historical ancestry, tell me more about how this is going down where it's going to be, 'cause I've read a bit of it and I'm excited to see more.
C: Um, so I wrote this for Stat, which is a publication about medicine and life sciences, and it appears in a three-part series and it's called Game of Genomes, and so if you go to Stat and you look for Game of Genomes, you'll be able to find this three-part series and for people who are real data junkies, some scientists and I actually set up a parallel website where they put all of their analysis and all their data, I threw my genome up there, it's all there for people to plow into if they wanna really kinda see how scientists actually take a genome and make sense of it.
H: Game of Genomes. Thank you so much, Carl Zimmer, for joining us here on SciShow, it's a fascinating conversation, and great work.
Really cool and I'm glad that you are not dying of any horrible genetic diseases.
C: Not yet.
H: All right, thank you.
C: Thanks a lot, thanks for having me.
H: Today, we have the new host of CrashCourse: Physics, Dr. Shini Somara.
H: I'm so excited to have you here.
S: It's so nice to be on these couches.
H: Do you like this couch?