Previous: Free Will, Witches, Murder, and Macbeth, Part 1: Crash Course Literature 409
Next: Selma: Crash Course Film Criticism #3



View count:1,492,454
Last sync:2022-12-26 08:30
Welcome to Crash Course Statistics! In this series we're going to take a look at the important role statistics play in our everyday lives, because statistics are everywhere! Statistics help us better understand the world and make decisions from what you'll wear tomorrow to government policy. But in the wrong hands, statistics can be used to misinform. So we're going to try to do two things in this series. Help show you the usefulness of statistics, but also help you become a more informed consumer of statistics. From probabilities, paradoxes, and p-values there's a lot to cover in this series, and there will be some math, but we promise only when it's most important. But first, we should talk about what statistics actually are, and what we can do with them. Statistics are tools, but they can't give us all the answers.

Episode Notes:

On Tea Tasting:
"The Lady Tasting Tea" by David Salsburg

On Chain Saw Injuries:

Crash Course is on Patreon! You can support us directly by signing up at

Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:

Mark Brouwer, Nickie Miskell Jr., Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Divonne Holmes à Court, Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, Robert Kunz, SR Foxley, Sam Ferguson, Yasenia Cruz, Daniel Baulig, Eric Koslow, Caleb Weeks, Tim Curwick, Evren Türkmenoğlu, Alexander Tamas, Justin Zingsheim, D.A. Noe, Shawn Arnold, mark austin, Ruth Perez, Malcolm Callis, Ken Penttinen, Advait Shinde, Cody Carpenter, Annamaria Herrera, William McGraw, Bader AlGhamdi, Vaso, Melissa Briski, Joey Quek, Andrei Krishkevich, Rachel Bright, Alex S, Mayumi Maeda, Kathy & Tim Philip, Montather, Jirat, Eric Kitchen, Moritz Schmidt, Ian Dundore, Chris Peters, Sandra Aft, Steve Marshall

Want to find Crash Course elsewhere on the internet?
Facebook -
Twitter -
Tumblr -
Support Crash Course on Patreon:
CC Kids:
Hi, I'm Adrian Hill and this is Crash Course Statistics. 

Welcome to a world of probabilities, paradoxes, and p-values. There will be games, and thought experiments, and coin-flipping. A lot of coin-flipping. Statisticans love to talk about coin-flipping.

By the time we finish the course, you'll know why we use statistics, and how. And what questions you ought to be asking when you run across statistics in world. Which is all the time. 

Statistics can help you make a guess whether or not you're going to be accepted to Harvard. Marketers use them to sell us gold lamé pants. Netflix uses stats to predict what show we might wanna watch next.

You use statistics when you look at the weather forecast and decide what to wear - dress or jeans. Policy makers use them to decide whether or not to invest in more early childhood education, whether or not to spend more on mental health services.

Statistics is all about making sense of data, and figuring out how to put that information to use. Today, we are going to answer the question, "What is statistics?" 

[Opening music]

The legend says that during a late 1920s English tea at Cambridge, a woman claimed that a cup of tea with milk added last tasted different than the tea when the milk was added first. The brilliant minds of the day immediately began to think of ways to test her claim. They organized eight cups of tea in all sorts of patterns to see if she could really tell the difference between the milk-first and the tea-first cups.

But even after they had seen her guesses, how could they really decide? Because she'd get about half the cups right just by randomly guessing either milk-first or tea-first. And even if she really could tell the difference, it's completely possible that she would miss a cup or two. So how could you tell if this woman was actually a tea savant? What's the line between lucky tea guesser and tea super-taster?

As fate would have it, future, future super-statistican and part-time potato scientist Ronald A. Fisher was in attendance. During his lifetime, Fisher began work that set the stage for a large portion of statistics, which is the focus of this series.

These statistics can help us make decisions in uncertain situations, tea taste tests and beyond. Fisher's insights into experimental design helped turn statistics into its own scientific discipline. And although Fisher didn't publish results of the tea test, the story has it the woman sorted all the tea cups correctly, just in case you were curious.

At this point, it's worth mentioning that there are two related, but separate meanings of the word statistics. We can refer to the field of statistics, which is the study and practice of collecting and analyzing data, and we can talk about statistics as in facts about, or summaries of data. 

To answer the question, "What is statistics?," we should first ask the question, "What can statistics do?" Let's say you wake up at your desk after a long evening studying for finals with a cheeseburger wrapper stuck to your face.

And you wonder, "Why do I eat this stuff? Is fast food controlling my life?" But then you tell yourself, "No, it's just super convenient."

But you're worried. You're thinking about how great it is that McDonald's breakfast all-day, right now. But maybe that's normal. I mean, finals are this week. So you google the question "fast food consumption" and you find the results of a fast food survey.

The first thing you might do is start asking questions that interest you. For example, you can ask, "Why do people eat fast food? Do people eat more fast food on the weekends than on weekdays? Does eating fast food stress me out?"

Now that we have some interesting questions, we need to ask ourselves an even more important one - can questions like these be answered by statistics? Like I mentioned earlier, statistics are tools for us to use, but they can't do all the heavy lifting.

To answer the question about why people eat fast food, you can ask them to fill out a questionnaire. But you can't know whether their answers truly represent what they are thinking. Maybe they answered dishonestly because they don't want to admit that they scarf McDonald's because they're too tired to cook dinner. Or because they're ashamed to admit they think Del Taco is delicious. Or because none of the given answers represented their reasons. Or they may not really know why they eat fast food. 

Armed with the results of the survey, you could tell that the most common reason people reported eating fast food was convenience or that the average number of meals that they eat out in a week was five. But you're not truly measuring why people eat so much fast food. 

You're measuring what we call a proxy: something that is related to what we want to measure, but isn't exactly what we want to measure. To answer whether people eat more fast food on the weekends, or whether eating it more than twice a week increases stress, we'd not only need to know how much people are eating fast food, which our questionnaire asked, but also which days they eat it. And we'd need an additional measure of stress.

You can use statistics to give good answer about whether you are going through a drive-thru more on the weekend, but even the question of whether eating fast food is associated with higher levels of stress is hard to answer directly. What is stress, and how can we measure it? And are people eating fast food because they're stressed, or does eating all those calories make them stressed?

It's often case that some of the most interesting questions are the ones that can't be answered by statistics, like why people eat fast food. Instead, we find questions that we can answer, like whether people who eat fast food often work more than eighty hours a week. 

The tools we use to answer these questions are statistics plural, and there are two main types - descriptive and inferential. Descriptive statistics describe what the data showed. Descriptive statistics usually include things like where the middle of the data is - what statisticians call measures of central tendency - and measures of how spread out the data are.

They take huge amounts of information that may not make much intuitive sense to us, and compress and summarize them to hopefully give us more useful information. 

Let's go to the Thought Bubble.

You've been working for two years in the local waffle factory. Day in and day out you create the golden brown-i-est, tastiest frozen waffles ever created. The holes are perfectly spaced, screaming for syrup, and now you want a raise. You deserve a raise. No one can make a waffle as well as you can.

But how much do you ask for? An extra $1000, an extra $5000? You know you're valuable, but have no other idea what other waffle-makers get paid. So you dig around online and find that there's an entire subreddit devoted to waffle-makers. And someone, username waffleleaks, has posted a spreadsheet of waffle-makers salaries. 

Now with a quick glance at this huge list of numbers, you can see whether the woman who works at a similar job at the rival frozen waffle company make more than you. You can see how much more you're making than the new guy, who's just now learning to mix batter. 

But you still don't know much about the paychecks of your waffle company as a whole. Or the industry, cause it turns out there are thousands of waffle-makers out there, and all you see is a list with data points, not patterns that can help you learn about how much you might be able to convince the boss to pay you.

Here is where descriptive statistics come in. You could calculate the average salary at your company, as well as how spread out everyone's salaries are around that average. You'd be able to see whether the CEO's paycheck as relatively close to the entry-level batter makers, or incredibly far away. And how your salary compares to both of their salaries. You could calculate the average salary of everyone in the industry with your job title and see the high and low end of that pay.

And then, armed with those descriptive statistics, you could confidently walk into the waffle boss' office and demand to be paid for your talents.

Thanks, Thought Bubble.

While descriptive statistics can be great, they only tell us the basics. Inferential statistics allows us to make inferences. Clever namers, those statisticians. Inferential statistics allow us to make conclusions that extend beyond the data we have in hand.

Image you have a candy barrel full of saltwater taffy. Some pink, some white, some yellow. If you wanted to know how many of each color your have, you could count them - one by one by one. That'd give you a set of descriptive statistics. But who has time for all that candy counting?

Or you could grab a giant handful of taffy and count just those your pulled out, which would be using descriptive statistics. If your candy was in fact mixed pretty evenly throughout the barrel and you got a big enough handful, you could use inferential statistics on that sample to estimate the content of the entire taffy stash.

We ask inferential statistics to do all sorts of much more complicated work for us. Inferential statistics lets us test an idea or hypothesis. Like answering whether people in the US under the age of thirty eat more fast food than people over thirty. We don't survey every person to answer that question. 

Let's say someone tells you that their new brain vitamin Smartie-Vite improves your IQ. You rush out and buy it? What if you told you average IQ increase for Group A, 20 people who took Smartie-Vite for a month, was 2 IQ points. And the average IQ increase for Group B, 20 people who took nothing, was 1 IQ point. How about now? Still not sure? It's a pretty small difference, right?

Inferential statistics give you the ability to test how likely it is that the two populations we sampled actually have different IQ increases. However, it's up to you as an individual to decide whether that's convincing or not.

And don't be alarmed if the bar you set isn't the same in every situation. It's entirely okay to have different standards for the questions, "Does my cat like Fancy Feast more than Meow Mix?" versus "Does this drug cure lung cancer?" It might take more evidence to convince you to take a new, supposedly cancer-curing drug than to switch cat food brands. It should take more evidence to convince you to take a new, supposedly cancer-curing drug than to switch cat food brands.

With inferential tests, there will always be some degree of uncertainty, since it can only tell you how likely something is or is not. You're job is to take that information and use it to make a decision despite that uncertainty. 

If statistics were a superhero, it's batcall would be uncertainty, and it's tagline would be, "When you don't know for sure, but doing nothing isn't an option."

Statistics are tools. Statistics help sense of the vast amount of information in the world. Just like our eyes and ears filter out unnecessary stimuli just to give us the best, most useful stuff, statistics help us filter the loads of data that come at us everyday. 

Descriptive statistics make the data we get more digestible, even though we lose information about individual data points. Inferential statistics can help us make decisions about data when there's uncertainty, like whether Smartie-Vite will actually increase your IQ.

But statistics can't do all the work. They're here to help us reason, no to reason for us. They can help us see through uncertainty, but they don't get rid of that uncertainty. 

To push our tool analogy a step further, statistics are pretty useless, even dangerous, without understanding how they work. We need to know how to use them, and how not to use them. As we'll see in later episodes, statistics done poorly can lead us to some pretty silly conclusions.

And chainsawing done poorly leads to about 36,000 injuries in the US each year, 81% of which are lacerations. Did you know almost no one dies because of chainsaw injuries? Once in a while, but it's very rare. 95% of the people who are hurt by chainsaws are male. This does not necessarily tell us that males are significantly worse chainsawers.  

Statistics can help us plan the vacation to Bali in December, they can help us optimize our chances of winning our fantasy football league, they can help us budget our meal card at college. Statistics can help us decide whether that additional insurance the guy at Best Buy is trying to sell us on our new blender is actually worth it.

Statistics can also help us decide whether or not to go ahead with an invasive heart surgery. Statistics can help NGOs optimize the amount of food aid they send to refugee camps. They can help policy makers decide if they should spend more or less money on helping students pay back their school loans. And can help you decide how much money you should be comfortable borrowing for college in the first place. 

There's a lot statistics can help us with, but some things statistics can't do. Thinking statistically means knowing the difference. So when your brother says he used statistics to prove that your mom loves him more, you can rest easy knowing the only question he answered is whether she gives him slightly more ice cream each night. And you've got data suggesting she gives you extra sprinkles.

Thanks for watching, I'll see you next time. 

Crash Course Statistics is filmed in the Chad and Stacy Emigholz Studio here in Indianapolis, Indiana, and it's made by all of these lovely people. Our animation team is Thought Cafe.

If you'd like to keep Crash Course free for everyone forever, you can support the series at Patreon, a crowdfunding platform that allows you to support the content you love. Thank you to all patrons for your continued support. 

Crash Course is a production of Complexly. If you like content designed to get you thinking, check out some of our other channels at 

Thanks for watching.