Previous: Why the Weak Nuclear Force Ruins Everything
Next: Where Are All the Electric Airplanes?



View count:287,706
Last sync:2022-11-05 19:30
The human genome is 3.2 billion base pairs long and contains around 20,000 genes, but how much of that is garbage?

Hosted by: Hank Green

Head to for hand selected artifacts of the universe!
Support SciShow by becoming a patron on Patreon:
Dooblydoo thanks go to the following Patreon supporters: Lazarus G, Sam Lutfi, D.A. Noe, سلطان الخليفي, Piya Shedden, KatieMarie Magnone, Scott Satovsky Jr, Charles Southerland, Patrick D. Ashmore, Tim Curwick, charles george, Kevin Bealer, Chris Peters
Looking for SciShow elsewhere on the internet?

 (00:00) to (02:00)


Hank: Once a year or so, a study makes the rounds that supposedly ends the idea of junk DNA, like, we thought most of our genome was garbage, but this time, we're sure that it's not.  See, the human genome is 3.2 billion base pairs long and contains around 20,000 genes, stretches of DNA that code for proteins, but those genes only make up about 1-2% of our DNA.  The other 99% of our genome is non-coding, and the question of whether it's useless or not has been hotly debated by biologists since the term "junk DNA" was coined in the 1960s.  So is there really a bunch of DNA that doesn't do much of anything or does it all do something and we just haven't figured out what?  

It seems like our genomes shouldn't be mostly junk.  It takes energy to copy and maintain large amounts of DNA, so it seems like we should get rid of any extra baggage, and while it's pretty easy for biologists to say that coding DNA does something, it's tricky to make the call about different types of non-coding DNA.  Protein-coding DNA has a clear and elegant function: a stretch of DNA is transcribed to RNA in a cell's nucleus and then translated into a protein.  Proteins do things that you can pretty easily study, like helping with chemical reactions or building a cellular structure, and humans are made up of thousands of types of cells which each need different kinds and amounts of proteins to do their jobs, so every cell needs a way of handing out instructions for how much to make of what stuff.

Enter transcription factors, proteins that increase or decrease the chance that a gene will be transcribed into RNA.  They can be switched on in response to all kinds of things like chemical signals from outside the cell or a certain stage in the organism's development and this is where a couple important types of non-coding DNA come into play with a clear purpose.  Transcription factors bind to DNA but not directly to genes.

 (02:00) to (04:00)

Instead, they bind to stretches of DNA that don't code for anything but have a direct effect on genes.  For instance, promoters are right in front of genes.  They're where the machinery that transcribes RNA actually comes together and gets to work.  When a transcription factor binds to a promoter, it helps everything get going. 

Now, naming all of the different kinds of non-coding DNA and what they do would be a little much, but let's talk about a couple you might have heard of before, like there are introns: gaps in coding regions that seem kind of like junk but actually, a cell can cut them out to mix and match segments of coding regions and create variants of a protein from a single gene.

There are also bits of non-coding DNA that gets transcribed into RNA that never gets translated into a protein, collectively called noncoding RNAs.  The structures that directly assemble proteins, called ribosomes, are partially made up of noncoding RNAs and transfer RNAs shuttle around the building blocks of proteins.  Some others, like micro RNAs and long noncoding RNAs act more like transcription factors.  They change the expression of a certain gene or region of the genome in indirect ways.  

So molecular biologists are interested in studying things like all the nitty gritty functions of transcription factor binding sites and noncoding RNAs, and by some definitions, any DNA that binds to a transcription factor or gets transcribed into RNA is considered functional, which seems like a victory for the anti-junk way of thinking.  

This idea peaked in 2012 thanks to the publication of a lot of research by an international consortium known as ENCODE.  It's short for Encyclopedia of DNA Elements.  They made a remarkable claim: that 80% of the human genome is functional in some way.  ENCODE used a variety of molecular biology techniques to support this idea.  Most of them had something to do with whether a transcription factor binds to a given piece of DNA.

For example, ChIP-seq is a way to isolate segments of DNA where a particular transcription factor binds, which scientists can then sequence.

 (04:00) to (06:00)

The idea behind ENCODE is really cool.  It was meant to help anyone who wanted to study specific stretches of DNA, like, a database of what binds there and what non-coding RNAs are produced, but there was immediate pushback against that 80% of the human genome is functional idea.  It wasn't that ENCODE's research was bad.  This was a huge international group of reputable scientists.  Their definition of 'functional' was just way too generous, as even some of the authors admit.  Like, great, you found places where transcription factors stick to DNA, but some regions of DNA can be kinda sticky.  Certain proteins might tend to bind there purely by chance and that doesn't necessarily mean anything is happening there other than stickyness.  That's only one criticism of many, but you get the idea.  Saying 80% of our genome is functional is probably too generous.

Now, different fields of biology work with different scientific tools.  Molecular biologists research very small biochemical reactions and the like, as their name suggests.  Meanwhile, evolutionary biologists are studying genome function through a broader lens, like trying to figure out why different organisms have vastly different genome sizes.  

For instance, onions and African lungfish have way more DNA than humans do.  Meanwhile, pufferfish seem to have weirdly little.  No one's complaining that humans are better than onions and that they somehow deserve to have more DNA.  It's just that being a human and being an onion are two very different things and it shouldn't take five times more DNA to be an onion, even if they are very tasty and have all those layers.  Like ogres.

The simplest answer to this puzzle is that a lot of DNA just isn't doing anything.  It's junk.  Even though maintaining extra DNA might be a little more energetically costly, the process of natural selection might not stop it from piling up, unless it seriously affects survival, and turns out, having extra DNA could also maybe help in some cases?  Consider pseudogenes, which led to the coining of the term 'junk DNA'.  

 (06:00) to (08:00)

Pseudogenes are what happen when a gene gets duplicated.  Initially, both copies probably work.  If one of them picks up a beneficial mutation, ta da, you got a brand new gene, thanks evolution.  But if one picks up a detrimental mutation, the duplicated gene stops working while the original chugs along normally.  The messed up version doesn't get pruned from the genome, though.  It hangs around as a non-functional bit of DNA that still mostly looks like a gene, hence pseudogene, a junky leftoer.

Our genome is also littered with viral sequences from old infections, plus there are transposons, which encode proteins that let that segment of DNA cut itself out of the genome and squeeze in somewhere else.  That's all they seem to do.  They just jump around, 'cause they listened to House of Pain too much.  

Every now and then, a paper assigns a function to some viral gene or transposon, but those seem to be in the vast minority, and there are a handful of other types of junk that don't seem to do anything for us.  It's hard to estimate how much of the genome is non-functional, because these elements are pretty repetitive.  Repetition tends to throw off our sequencing methods because those methods rely on looking at a bunch of copies of the genome at once.  It's like throwing 600 copies of Hamlet into a blender.  The unique parts like "Alas, poor Yorick," they're fairly easy to spot, but how many times does Shakespeare use the word 'the'?  If you just get a 'the', you don't know where to put it.

Overall, a lot of the data in support of junk DNA comes from population genetics, which involves a lot of math.  Take one paper published in 2017 in the journal Genome Biology and Evolution.  Instead of looking at where transcription factors bind, their definition of "functional" hinged on whether a DNA sequence could be acted on by natural selection in a positive or negative way.  

Most mutations that are subject to natural selection are bad, because they break some cellular system and make survival less likely.  The study's author estimated the rates that harmful mutations build up in humans compared to our rate of reproduction and effective population size.  

 (08:00) to (10:00)

That's a statistical term that refers to how much of the population is finding mates and making babies.  Basically, natural selection can only eliminate harmful mutations as fast as a species can breed or they would slowly die out, but mutations in junky regions wouldn't have bad consequences, so they would stick around.  

This paper tried to calculate what it would take to maintain a human genome that's 80% functional.  Even using a conserative estimate of mutation rate, it concluded that every human couple would have to have 15 children, 13 of which would survive and have children of their own, which is super unrealistic.  

An earlier paper published by UK researchers in 2014 in PLOS Genetics used a similar definition of "functional".  They compared the rates of change of mammalian genomes to one another.  Genes come and go across evolutionary time, but by comparing related groups side by side, researchers can figure out the most important regions subject to natural selection.  They found that protein-coding genes and certain non-coding elements were pretty stable.  For example, promoters were less stable than coding regions, but mostly stayed put, but other noncoding stretches like transposons weren't really preserved across species, and therefore, they might not do much.

Interestingly, both of these studies independently came up with similar estimates for how much of the genome is functional.  The 2017 study suggested the absolute upper limit for how much of the genome has a function is around 25% but favored a more conservative figure of 10-15%, and the 2014 study landed at 8.2%.  That's obviously way less than ENCODE's sweeping 80% claim and those estimates still leave room for functional non-coding elements on top of the coding stuff.

So we still don't have a clear answer for what's junk DNA and what's not because how scientists define "functional" varies a lot, and we're still trying to understand everything our genomes can do.  There's a lot of complexity in those 3.2 billion base pairs, and we'd be foolish to think we know every trick, but it's also foolish to think that evolution is so elegant that there's barely any room for messiness.

 (10:00) to (10:56)

Evolution has no end goal and no sense of aesthetics.  It's a whole lot of probability, so sometimes stuff piles up, like DNA that doesn't do anything.  In the end, we may have to meet in the middle ground to reconcile both of those ideas, and of course, keep learning.

Thank you for watching this episode of SciShow.  I thought it was really good.  I liked it.  If you wanna learn more about DNA and really all kinds of science, you can go to to subscribe.  There is also, I think, a little button right under this video.