crashcourse
Bioinformatics: How Data Saves Lives: Crash Course Biology #40
YouTube: | https://youtube.com/watch?v=o-WFU5ovaTc |
Previous: | How to Look at Art: Crash Course Art History #2 |
Next: | The First Fraction of a Second | Crash Course Pods: The Universe #1 |
Categories
Statistics
View count: | 40,584 |
Likes: | 1,614 |
Comments: | 39 |
Duration: | 11:27 |
Uploaded: | 2024-04-23 |
Last sync: | 2025-01-16 23:30 |
Citation
Citation formatting is not guaranteed to be accurate. | |
MLA Full: | "Bioinformatics: How Data Saves Lives: Crash Course Biology #40." YouTube, uploaded by CrashCourse, 23 April 2024, www.youtube.com/watch?v=o-WFU5ovaTc. |
MLA Inline: | (CrashCourse, 2024) |
APA Full: | CrashCourse. (2024, April 23). Bioinformatics: How Data Saves Lives: Crash Course Biology #40 [Video]. YouTube. https://youtube.com/watch?v=o-WFU5ovaTc |
APA Inline: | (CrashCourse, 2024) |
Chicago Full: |
CrashCourse, "Bioinformatics: How Data Saves Lives: Crash Course Biology #40.", April 23, 2024, YouTube, 11:27, https://youtube.com/watch?v=o-WFU5ovaTc. |
On its own, a huge DNA sequence is a meaningless pile of data — so, how do biologists figure out what it means? They turn to the power of bioinformatics! In this episode, we’ll learn what bioinformatics is, how it works, and how scientists have used it to better understand everything from evolution to a viral epidemic.
Introduction: Pizza Data 00:00
Bioinformatics 1:20
Algorithms 2:33
The Human Genetic Code 3:28
The BRCA1 Gene 5:07
Transcriptomes 6:14
The Zika Virus 7:22
Bioinformatics & Programming 8:57
Review & Credits 10:07
This series was produced in collaboration with HHMI BioInteractive, committed to empowering educators and inspiring students with engaging, accessible, and quality classroom resources. Visit https://BioInteractive.org/CrashCourse for more information.
Are you an educator looking for what NGSS Standards are covered in this episode? Check out our Educator Standards Database for Biology here: https://www.thecrashcourse.com/biologystandards
Check out our Biology playlist here: https://www.youtube.com/playlist?list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB
Watch this series in Spanish on our Crash Course en Español channel here: https://www.youtube.com/playlist?list=PLkcbA0DkuFjWQZzjwF6w_gUrE_5_d3vd3
Sources: https://docs.google.com/document/d/1GLDtAXE6ekg4Chk2qN3TYbNt0pJbyaHqTqRd6QY8pd4/edit?usp=sharing
***
Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse
Thanks to the following patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Leah H., David Fanska, Andrew Woods, DL Singfield, Ken Davidian, Stephen Akuffo, Toni Miles, Steve Segreto, Kyle & Katherine Callahan, Laurel Stevens, Burt Humburg, Perry Joyce, Scott Harrison, Mark & Susan Billian, Alan Bridgeman, Breanna Bosso, Matt Curls, Jennifer Killen, Jon Allen, Sarah & Nathan Catchings, team dorsey, Bernardo Garza, Trevin Beattie, Eric Koslow, Indija-ka Siriwardena, Jason Rostoker, Siobhán, Ken Penttinen, Nathan Taylor, Barrett & Laura Nuzum, Les Aker, William McGraw, Vaso, ClareG, Rizwan Kassim, Constance Urist, Alex Hackman, Pineapples of Solidarity, Katie Dean, Stephen McCandless, Wai Jack Sin, Ian Dundore, Caleb Weeks
__
Want to find Crash Course elsewhere on the internet?
Instagram - https://www.instagram.com/thecrashcourse/
Facebook - http://www.facebook.com/YouTubeCrashCourse
Twitter - http://www.twitter.com/TheCrashCourse
CC Kids: http://www.youtube.com/crashcoursekids
Introduction: Pizza Data 00:00
Bioinformatics 1:20
Algorithms 2:33
The Human Genetic Code 3:28
The BRCA1 Gene 5:07
Transcriptomes 6:14
The Zika Virus 7:22
Bioinformatics & Programming 8:57
Review & Credits 10:07
This series was produced in collaboration with HHMI BioInteractive, committed to empowering educators and inspiring students with engaging, accessible, and quality classroom resources. Visit https://BioInteractive.org/CrashCourse for more information.
Are you an educator looking for what NGSS Standards are covered in this episode? Check out our Educator Standards Database for Biology here: https://www.thecrashcourse.com/biologystandards
Check out our Biology playlist here: https://www.youtube.com/playlist?list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB
Watch this series in Spanish on our Crash Course en Español channel here: https://www.youtube.com/playlist?list=PLkcbA0DkuFjWQZzjwF6w_gUrE_5_d3vd3
Sources: https://docs.google.com/document/d/1GLDtAXE6ekg4Chk2qN3TYbNt0pJbyaHqTqRd6QY8pd4/edit?usp=sharing
***
Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse
Thanks to the following patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Leah H., David Fanska, Andrew Woods, DL Singfield, Ken Davidian, Stephen Akuffo, Toni Miles, Steve Segreto, Kyle & Katherine Callahan, Laurel Stevens, Burt Humburg, Perry Joyce, Scott Harrison, Mark & Susan Billian, Alan Bridgeman, Breanna Bosso, Matt Curls, Jennifer Killen, Jon Allen, Sarah & Nathan Catchings, team dorsey, Bernardo Garza, Trevin Beattie, Eric Koslow, Indija-ka Siriwardena, Jason Rostoker, Siobhán, Ken Penttinen, Nathan Taylor, Barrett & Laura Nuzum, Les Aker, William McGraw, Vaso, ClareG, Rizwan Kassim, Constance Urist, Alex Hackman, Pineapples of Solidarity, Katie Dean, Stephen McCandless, Wai Jack Sin, Ian Dundore, Caleb Weeks
__
Want to find Crash Course elsewhere on the internet?
Instagram - https://www.instagram.com/thecrashcourse/
Facebook - http://www.facebook.com/YouTubeCrashCourse
Twitter - http://www.twitter.com/TheCrashCourse
CC Kids: http://www.youtube.com/crashcoursekids
Imagine you’re conducting a survey in the school cafeteria, asking questions and taking detailed notes about how people like their pizzas.
We’re talking favorite toppings, what kind of crust, and the ultimate question: do you eat it folded or flat. Personally, I prefer it on a bagel, cause then you can eat pizza anytime.
By the end of this survey, you’ll probably have gotten some weird looks, but at least you’ll have a big pile of data. On its own, your notebook of pizza preferences and tally marks isn’t gonna tell you much. To make sense of those numbers, you’ll have to analyze them. This could be as simple as counting up your numbers and learning that more people are rockin’ thin crust than deep dish. But what if you wanted to answer a more complicated question, like how someone’s stress levels affect how much or how little they eat? Or, what if you wanted to take your survey further and analyze the pizza habits of the whole school district?
You’re gonna need more than tally marks in a notebook. Biologists, and really most scientists, use computers to analyze data all the time – and for way more than planning pizza parties. Hi, I’m Dr.
Sammy, your friendly neighborhood entomologist, and this is Crash Course Biology. Hey Callie, could you serve me up a pizza that theme music? (THEME MUSIC) Bioinformatics can help scientists sort through data about everything from DNA to weather conditions, to the number of organisms on a beach three Tuesdays ago — all information that can be useful for different areas of research and conservation. You start with a question or problem. It can be as simple as pizza preferences, or more complex, like, are two different species of fish experiencing different amounts of stress?
Then, you get a collection of numbers or a dataset. It could be an enormous one showing, say, the stress hormones of every species of fish in the African great lakes. Or, it could be a much smaller dataset about the five jellies in your home aquarium. How ya feeling, buddies? Oh, they’re not…they’re not real. We got ten episodes left and now y’all tell me.
Why have I been feeding them?! Ahem, anyway. So, the data could be something you and your team personally collected, or it could be from a larger, public dataset. The bioinformatics community is pretty open; they typically love to share information.
In fact, scientists are often supported by data that ordinary people have collected through community science projects to help fill in gaps. All that collected data can be analyzed to help answer the original question. To find those answers, you take that data, and you use a specific set of instructions to analyze it, called an algorithm. In many cases, you communicate those instructions to a computer, which uses the algorithm to solve a problem.
You’ve probably run into your fair share of algorithms around the internet, like on TikTok or on YouTube. Like, if this video appeared on your YouTube homepage, it wasn’t a random action. It’s because an algorithm analyzed data about other videos you’ve watched and predicted that you might like this one, too.
In this case, the “instructions” are to learn your viewing habits, and the “problem” is, well, how to keep you there as long as it can. So be sure to smash that subscribe button! Sorry...we're obligated to say it that way at least once per series. Could you –uh–could you also click the little bell too?
But algorithms can be fed all kinds of instructions that help us learn some pretty amazing information about living organisms, the environment, or even the patterns of disease spread. And bioinformatics can help us in nearly every stage of the scientific process. Like, let’s start with acquiring the data itself. For example, if we want to know the entire genetic sequence of an organism, we can’t just read it from one end to the other. Because it’s so long, we can only get it in small, overlapping chunks.
But we can then use computers to organize those chunks and piece them together in the correct order, like assembling a jigsaw puzzle but in the end, you get a complete dataset – and unlike a puzzle, your cat can’t knock that off a table. Using that method, in 2022, researchers officially completed one of the biggest scientific undertakings in human history. For the first time, we’d assembled an almost complete list of every letter in human DNA. They’d been working on this for decades, and it was like they’d finally unlocked the instruction manual that made humans tick.
The possibilities for medicine seemed endless! Except, there was a catch. No one was fluent in the language this instruction manual was written in. Turns out, it’s one thing to know an organism’s DNA sequence and a whole other thing to figure out what those building blocks do.
And this isn’t exactly a code you can crack on paper. Because if you were to print out the roughly 3 billion letters stored in your DNA, they would fill hundreds of thousands of pages. And cracking the code requires comparing the DNA sequences of many individuals. We’re talking an absolute ocean of data.
Enter bioinformatics, again! Scientists can use computers to compare a bunch of DNA sequences until a pattern emerges that lets us predict how certain genes are working in an organism’s DNA. And these patterns can help us understand and treat diseases, including cancer. For example, there’s this gene called BRCA1, which codes for a protein that keeps cells growing normally, suppressing tumors and potentially cancerous masses. Which is a pretty big job! In fact, if even a single letter in someone’s BRCA1 gene is off, it can lead to an increased risk of breast or ovarian cancer.
Overall, variations in this gene are linked to a chance of developing one of these cancers by age seventy. But not every variation carries the same risk. So, some scientists have turned to bioinformatics to try to understand how dangerous each of the gene’s mutations are. They compared more than 3,600 BRCA1 variants — trying to figure out what the gene variants look like, what they do to the tumor-suppressing protein, and what overall effects that might have in the body.
Scientists then compiled all of this annotated data into a bioinformatics tool for others to use, and the information it contains helps doctors make more informed treatment plans, which could mean better outcomes for the patient overall. Bioinformatics can also help us sort through which genes are being expressed, or turned on, in an organism. You see, while some genes may be present, only the ones that are expressed result in protein production that impacts the organism.
To figure out which genes are being expressed, we can look at all the RNA molecules in an organism’s cells at any given time, called the transcriptome. This is an indicator of which genes are being transcribed and used to make proteins. As with DNA, it would be impossible for a human to analyze and compare organisms' transcriptomes manually because they’re so complex. But with bioinformatics, we can figure out which genes are being expressed to make certain traits happen. And if those traits are beneficial — like say, disease resistance in crops — we can learn how to use genetic modification techniques to make other organisms similarly resistant.
Bioinformatics has super broad applications, in nearly all areas of biology. For example, it can help us gain a better understanding of evolution. By tracking how genes differ among closely-related organisms — like different species of birds or primates — we can gain insight into how these organisms might have evolved. And, we can study random changes in organisms’ DNA, called mutations. Which not only helps us understand species evolution better; it can also help us see how viruses spread through a population. And that can be lifesaving information.
For instance, in 2016, the mosquito-borne Zika virus swept through the Brazilian state of Minas Gerais. And although Zika virus disease usually has mild or nonexistent symptoms, it can lead to problems with brain development in newborns or neurological diseases, like Guillain-Barre syndrome, where the body attacks its own nerves, in adults. So, an international team came together to use bioinformatics to try to understand how this virus got to Minas Gerais and how it evolved over the course of the epidemic.
The researchers took samples from patients with Zika virus disease. And then, they did something pretty amazing: Right on the spot, they were able to analyze the DNA in those samples using handheld nanopore technologies. Which is about as sci-fi as it sounds. These are essentially tiny computers that let scientists sequence—or figure out the precise makeup of— DNA and RNA, wherever they are.
There’s no need to ship anything to the lab — you just feed the device a few drops of sample and hit go. Then, those sequences were uploaded to a larger database, and compared with each other to build a picture of how they were related, a little like a virus family tree. Ultimately, this DNA analysis told the team that the virus had been circulating in Minas Gerais for at least sixteen months before it was confirmed in the lab. And by gaining a deeper understanding of the outbreak’s journey and timeline, researchers were better equipped to slow the disease’s spread. [CHAPTER 8 - BIOINFORMATICS & PROGRAMMING] No matter what you’re studying, with bioinformatics, the computer is doing a lot of the math. So while it’s important to understand what an algorithm is trying to do, you don’t necessarily have to be an expert in math or programming to use one. That said, despite being able to sort through piles of data faster than you can say “bioinformatics,” algorithms aren’t all-knowing overlords.
You see, algorithms work precisely because programmers give them limits and assumptions, which are set values that the system assumes are true. For instance, if we were to create one for our pizza survey, we might give it the assumption that anyone who didn't like their pizza folded ate it flat. This helps it make calculations more quickly because now it knows that any answer that doesn’t fit the “folded” value must be “flat.” Limits and assumptions are really important, and if you don’t know what they are, your program might give you misleading results, or might not work at all. Like, say I was sorting through that pizza database, and I asked a computer to give me all the data about Deep Dish pizza. If it was programmed to limit results to just data about topping preferences, it might throw me an error. “Sorry, Dr.
Sammy. Deep dish pizza doesn’t exist.” But that’s not true, I’m just looking in the wrong data set. So, just like a lab coat, a clipboard, and a microscope – computers and algorithms are tools in the biologist’s utility belt.
They can be used to analyze huge sets of information that would take us humans many lifetimes to work through on paper. Thanks to bioinformatics, the fields of biology have advanced by leaps and bounds, and continue to grow as engineers develop better computers, and programmers build better algorithms that allow biologists to ask more and more complex and fascinating questions. And speaking of complex questions, next time we’re going to answer a weird one: why aren't we made up of just one big cell? I’ll see ya then!
Deuces! This series was produced in collaboration with HHMI BioInteractive. If you’re an educator, visit BioInteractive.org/CrashCourse for classroom resources and professional development related to the topics covered in this course. Thanks for watching this episode of Crash Course Biology which was filmed at our studio in Indianapolis, Indiana, and was made with the help of all these nice people. If you want to help keep Crash Course free for everyone, forever, you can join our community on Patreon.
We’re talking favorite toppings, what kind of crust, and the ultimate question: do you eat it folded or flat. Personally, I prefer it on a bagel, cause then you can eat pizza anytime.
By the end of this survey, you’ll probably have gotten some weird looks, but at least you’ll have a big pile of data. On its own, your notebook of pizza preferences and tally marks isn’t gonna tell you much. To make sense of those numbers, you’ll have to analyze them. This could be as simple as counting up your numbers and learning that more people are rockin’ thin crust than deep dish. But what if you wanted to answer a more complicated question, like how someone’s stress levels affect how much or how little they eat? Or, what if you wanted to take your survey further and analyze the pizza habits of the whole school district?
You’re gonna need more than tally marks in a notebook. Biologists, and really most scientists, use computers to analyze data all the time – and for way more than planning pizza parties. Hi, I’m Dr.
Sammy, your friendly neighborhood entomologist, and this is Crash Course Biology. Hey Callie, could you serve me up a pizza that theme music? (THEME MUSIC) Bioinformatics can help scientists sort through data about everything from DNA to weather conditions, to the number of organisms on a beach three Tuesdays ago — all information that can be useful for different areas of research and conservation. You start with a question or problem. It can be as simple as pizza preferences, or more complex, like, are two different species of fish experiencing different amounts of stress?
Then, you get a collection of numbers or a dataset. It could be an enormous one showing, say, the stress hormones of every species of fish in the African great lakes. Or, it could be a much smaller dataset about the five jellies in your home aquarium. How ya feeling, buddies? Oh, they’re not…they’re not real. We got ten episodes left and now y’all tell me.
Why have I been feeding them?! Ahem, anyway. So, the data could be something you and your team personally collected, or it could be from a larger, public dataset. The bioinformatics community is pretty open; they typically love to share information.
In fact, scientists are often supported by data that ordinary people have collected through community science projects to help fill in gaps. All that collected data can be analyzed to help answer the original question. To find those answers, you take that data, and you use a specific set of instructions to analyze it, called an algorithm. In many cases, you communicate those instructions to a computer, which uses the algorithm to solve a problem.
You’ve probably run into your fair share of algorithms around the internet, like on TikTok or on YouTube. Like, if this video appeared on your YouTube homepage, it wasn’t a random action. It’s because an algorithm analyzed data about other videos you’ve watched and predicted that you might like this one, too.
In this case, the “instructions” are to learn your viewing habits, and the “problem” is, well, how to keep you there as long as it can. So be sure to smash that subscribe button! Sorry...we're obligated to say it that way at least once per series. Could you –uh–could you also click the little bell too?
But algorithms can be fed all kinds of instructions that help us learn some pretty amazing information about living organisms, the environment, or even the patterns of disease spread. And bioinformatics can help us in nearly every stage of the scientific process. Like, let’s start with acquiring the data itself. For example, if we want to know the entire genetic sequence of an organism, we can’t just read it from one end to the other. Because it’s so long, we can only get it in small, overlapping chunks.
But we can then use computers to organize those chunks and piece them together in the correct order, like assembling a jigsaw puzzle but in the end, you get a complete dataset – and unlike a puzzle, your cat can’t knock that off a table. Using that method, in 2022, researchers officially completed one of the biggest scientific undertakings in human history. For the first time, we’d assembled an almost complete list of every letter in human DNA. They’d been working on this for decades, and it was like they’d finally unlocked the instruction manual that made humans tick.
The possibilities for medicine seemed endless! Except, there was a catch. No one was fluent in the language this instruction manual was written in. Turns out, it’s one thing to know an organism’s DNA sequence and a whole other thing to figure out what those building blocks do.
And this isn’t exactly a code you can crack on paper. Because if you were to print out the roughly 3 billion letters stored in your DNA, they would fill hundreds of thousands of pages. And cracking the code requires comparing the DNA sequences of many individuals. We’re talking an absolute ocean of data.
Enter bioinformatics, again! Scientists can use computers to compare a bunch of DNA sequences until a pattern emerges that lets us predict how certain genes are working in an organism’s DNA. And these patterns can help us understand and treat diseases, including cancer. For example, there’s this gene called BRCA1, which codes for a protein that keeps cells growing normally, suppressing tumors and potentially cancerous masses. Which is a pretty big job! In fact, if even a single letter in someone’s BRCA1 gene is off, it can lead to an increased risk of breast or ovarian cancer.
Overall, variations in this gene are linked to a chance of developing one of these cancers by age seventy. But not every variation carries the same risk. So, some scientists have turned to bioinformatics to try to understand how dangerous each of the gene’s mutations are. They compared more than 3,600 BRCA1 variants — trying to figure out what the gene variants look like, what they do to the tumor-suppressing protein, and what overall effects that might have in the body.
Scientists then compiled all of this annotated data into a bioinformatics tool for others to use, and the information it contains helps doctors make more informed treatment plans, which could mean better outcomes for the patient overall. Bioinformatics can also help us sort through which genes are being expressed, or turned on, in an organism. You see, while some genes may be present, only the ones that are expressed result in protein production that impacts the organism.
To figure out which genes are being expressed, we can look at all the RNA molecules in an organism’s cells at any given time, called the transcriptome. This is an indicator of which genes are being transcribed and used to make proteins. As with DNA, it would be impossible for a human to analyze and compare organisms' transcriptomes manually because they’re so complex. But with bioinformatics, we can figure out which genes are being expressed to make certain traits happen. And if those traits are beneficial — like say, disease resistance in crops — we can learn how to use genetic modification techniques to make other organisms similarly resistant.
Bioinformatics has super broad applications, in nearly all areas of biology. For example, it can help us gain a better understanding of evolution. By tracking how genes differ among closely-related organisms — like different species of birds or primates — we can gain insight into how these organisms might have evolved. And, we can study random changes in organisms’ DNA, called mutations. Which not only helps us understand species evolution better; it can also help us see how viruses spread through a population. And that can be lifesaving information.
For instance, in 2016, the mosquito-borne Zika virus swept through the Brazilian state of Minas Gerais. And although Zika virus disease usually has mild or nonexistent symptoms, it can lead to problems with brain development in newborns or neurological diseases, like Guillain-Barre syndrome, where the body attacks its own nerves, in adults. So, an international team came together to use bioinformatics to try to understand how this virus got to Minas Gerais and how it evolved over the course of the epidemic.
The researchers took samples from patients with Zika virus disease. And then, they did something pretty amazing: Right on the spot, they were able to analyze the DNA in those samples using handheld nanopore technologies. Which is about as sci-fi as it sounds. These are essentially tiny computers that let scientists sequence—or figure out the precise makeup of— DNA and RNA, wherever they are.
There’s no need to ship anything to the lab — you just feed the device a few drops of sample and hit go. Then, those sequences were uploaded to a larger database, and compared with each other to build a picture of how they were related, a little like a virus family tree. Ultimately, this DNA analysis told the team that the virus had been circulating in Minas Gerais for at least sixteen months before it was confirmed in the lab. And by gaining a deeper understanding of the outbreak’s journey and timeline, researchers were better equipped to slow the disease’s spread. [CHAPTER 8 - BIOINFORMATICS & PROGRAMMING] No matter what you’re studying, with bioinformatics, the computer is doing a lot of the math. So while it’s important to understand what an algorithm is trying to do, you don’t necessarily have to be an expert in math or programming to use one. That said, despite being able to sort through piles of data faster than you can say “bioinformatics,” algorithms aren’t all-knowing overlords.
You see, algorithms work precisely because programmers give them limits and assumptions, which are set values that the system assumes are true. For instance, if we were to create one for our pizza survey, we might give it the assumption that anyone who didn't like their pizza folded ate it flat. This helps it make calculations more quickly because now it knows that any answer that doesn’t fit the “folded” value must be “flat.” Limits and assumptions are really important, and if you don’t know what they are, your program might give you misleading results, or might not work at all. Like, say I was sorting through that pizza database, and I asked a computer to give me all the data about Deep Dish pizza. If it was programmed to limit results to just data about topping preferences, it might throw me an error. “Sorry, Dr.
Sammy. Deep dish pizza doesn’t exist.” But that’s not true, I’m just looking in the wrong data set. So, just like a lab coat, a clipboard, and a microscope – computers and algorithms are tools in the biologist’s utility belt.
They can be used to analyze huge sets of information that would take us humans many lifetimes to work through on paper. Thanks to bioinformatics, the fields of biology have advanced by leaps and bounds, and continue to grow as engineers develop better computers, and programmers build better algorithms that allow biologists to ask more and more complex and fascinating questions. And speaking of complex questions, next time we’re going to answer a weird one: why aren't we made up of just one big cell? I’ll see ya then!
Deuces! This series was produced in collaboration with HHMI BioInteractive. If you’re an educator, visit BioInteractive.org/CrashCourse for classroom resources and professional development related to the topics covered in this course. Thanks for watching this episode of Crash Course Biology which was filmed at our studio in Indianapolis, Indiana, and was made with the help of all these nice people. If you want to help keep Crash Course free for everyone, forever, you can join our community on Patreon.