crashcourse
Degrees of Freedom and Effect Sizes: Crash Course Statistics #28
YouTube: | https://youtube.com/watch?v=Cm0vFoGVMB8 |
Previous: | The New Chemistry: Crash Course History of Science #18 |
Next: | Heat Transfer: Crash Course Engineering #14 |
Categories
Statistics
View count: | 171,325 |
Likes: | 3,378 |
Comments: | 75 |
Duration: | 13:30 |
Uploaded: | 2018-08-22 |
Last sync: | 2024-03-23 12:00 |
Citation
Citation formatting is not guaranteed to be accurate. | |
MLA Full: | "Degrees of Freedom and Effect Sizes: Crash Course Statistics #28." YouTube, uploaded by CrashCourse, 22 August 2018, www.youtube.com/watch?v=Cm0vFoGVMB8. |
MLA Inline: | (CrashCourse, 2018) |
APA Full: | CrashCourse. (2018, August 22). Degrees of Freedom and Effect Sizes: Crash Course Statistics #28 [Video]. YouTube. https://youtube.com/watch?v=Cm0vFoGVMB8 |
APA Inline: | (CrashCourse, 2018) |
Chicago Full: |
CrashCourse, "Degrees of Freedom and Effect Sizes: Crash Course Statistics #28.", August 22, 2018, YouTube, 13:30, https://youtube.com/watch?v=Cm0vFoGVMB8. |
Today we're going to talk about degrees of freedom - which are the number of independent pieces of information that make up our models. More degrees of freedom typically mean more concrete results. But something that is statistically significant isn't always practically significant. And to measure that, we'll introduce another new concept - effect size.
Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse
Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Mark Brouwer, Erika & Alexa Saur, Glenn Elliott, Justin Zingsheim, Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Divonne Holmes à Court. Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, D.A. Noe, Shawn Arnold, Ruth Perez, Malcolm Callis, Ken Penttinen, Advait Shinde, William McGraw, Andrei Krishkevich, Rachel Bright, Mayumi Maeda, Kathy & Tim Philip, Jirat, Eric Kitchen, Ian Dundore, Chris Peters
--
Want to find Crash Course elsewhere on the internet?
Facebook - http://www.facebook.com/YouTubeCrashCourse
Twitter - http://www.twitter.com/TheCrashCourse
Tumblr - http://thecrashcourse.tumblr.com
Support Crash Course on Patreon: http://patreon.com/crashcourse
CC Kids: http://www.youtube.com/crashcoursekids
Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse
Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Mark Brouwer, Erika & Alexa Saur, Glenn Elliott, Justin Zingsheim, Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Divonne Holmes à Court. Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, D.A. Noe, Shawn Arnold, Ruth Perez, Malcolm Callis, Ken Penttinen, Advait Shinde, William McGraw, Andrei Krishkevich, Rachel Bright, Mayumi Maeda, Kathy & Tim Philip, Jirat, Eric Kitchen, Ian Dundore, Chris Peters
--
Want to find Crash Course elsewhere on the internet?
Facebook - http://www.facebook.com/YouTubeCrashCourse
Twitter - http://www.twitter.com/TheCrashCourse
Tumblr - http://thecrashcourse.tumblr.com
Support Crash Course on Patreon: http://patreon.com/crashcourse
CC Kids: http://www.youtube.com/crashcoursekids
Hi, I’m Adriene Hill, and welcome back to Crash Course Statistics.
It’s great to have a lot of choices. But sometimes we limit our choices in order to do something productive or meaningful.
Like being on a team project that needs a writer, director, host, camera person, and boom mic holder. If we have 5 different people who can be on that team, after assigning 4 of them positions...the last person doesn’t have any freedom to choose theirs. It has effectively been assigned.
If she’s willing to give up the freedom to have a choice of positions and take on the great feat of upper body strength that is holding a boom mic, then they have a team that can complete their project. This can happen in statistics, too. Occasionally we have to give up some freedom--degrees of freedom--in order to do something useful with our data.
Degrees of freedom are the number of independent pieces of information we have and Degrees of freedom are an important part of many of the models that we use. In fact, we’ve also been leaving out another important component of the t-test: effect size. Knowing what degrees-of-freedom and effect-size are and why they matter will help give our t-tests better context.
INTRO In the last few episodes we’ve covered the general formula for test statistics. And we’ve gotten pretty good at calculating t-statistics for all sorts of situations: means, proportions, one sample, two sample, paired, unpaired but every time we’ve needed a p-value, we’ve let the computer do the work. Which is what we’ll continue to do.
But it’s important to know that we’re not using the same t-distribution every single time. As we’ve previously discussed, the t-distribution is like the z-distribution, but it has fatter tails, meaning that extreme t-values tend to be slightly more likely. And that’s because we don’t know the population standard deviation when we calculate a t-statistic, so we estimate it using the sample standard deviation.
This little bit of uncertainty means that we don’t have a perfect normal--or z--distribution. Instead we have our fat tailed friend. But with bigger sample sizes, we’re better able to estimate population parameters like the mean and standard deviation, so our t-distribution changes its shape to reflect that.
As n--our sample size--gets bigger, we’re less and less uncertain about our estimate, and the t-distribution will get closer and closer to z. More information usually means we have a more accurate estimate. Degrees of freedom can help us measure that accuracy.
We choose our t-distribution based on the number of degrees of freedom that we have. Degrees of freedom are the number of pieces of independent information in our data. Let’s go to the thought bubble.
After dinner with 2 friends, you all pull out your credit cards to split the bill. Your friend Carmen, who’s a bit of math savant, and a bit of a showoff, notices that if you took your credit card numbers as a single 16 digit number, the mean of your three credit card numbers is 4551-9681-7590-9146. She said this really loudly and you’re a little nervous that an identity thief might have been lurking nearby and overheard Carmen make her very public declaration.
But there’s nothing to worry about! Even though a potential thief has the mean of your credit card numbers, they won’t be able to figure out what any of your individual numbers are. In other words, there’s a lot of “freedom” around what those numbers could be.
And actually, you’d even be okay if the thief found out Carmen‘s credit card number. At that point, they could figure out the sum or mean of your and your other friend Eli‘s cards, but they still couldn’t tell what your exact number was. There’s still freedom for your credit card number to take on different values.
It could be any of these: BUT as soon as someone knows the mean of all three cards, Carmen’s number, and Eli ’s number, they’ll know exactly what your credit card number is. It’s no longer “free” to take on different values. If Carmen’s number is this: And Eli’s number is this: Then knowing the mean allows anyone to figure out that your number must be this: So you should probably make sure that Eli keeps his number under wraps.
Just to be safe. Thanks Thought bubble. In that example, the three credit card numbers already existed before we started doing any math.
And they are three independent pieces of information. Eli’s credit card number has no effect on your credit card number, which has no effect on Carmen’s, and so on. But, as soon as Carmen calculated the mean, she used up one of those independent pieces of information.
Once the thief knows the mean, they only need TWO pieces of independent information. (that is n-1 pieces). In this case, once they know any two of the credit card numbers--and the mean--they know all three. So when they learn Carmen’s number and Eli’s number -- SUDDENLY those numbers can reveal yours.
The thief can figure out your exact credit card number. Since it’s no longer independent of the others. To bring it back to our t-tests... when we calculate a mean, we’re using up one degree of freedom--or one piece of independent information.
The amount of information that we originally have depends on our sample size--n--which is why you’ll often see it in the formulas to calculate degrees of freedom. The more data you have, the more independent information that you have. But every time you make a calculation like a mean, you’re using up one piece of independent information.
So, for example, we have data from 100 randomly sampled square miles of avocado orchard, and we’ve painstakingly counted the number of bees spotted in each sampled square mile over the course of a week. The bee population is declining! We need to be sure avocados are getting pollinated!
The owner of one avocado orchard says that she usually sees 15,000 bees per square mile. So, you set out to analyze your data to see whether you think the bee population has changed. You have 100 pieces of independent data--one measure from each square mile--so, when you calculate the mean number of bees from all 100 square miles, you’re using up 1 degree of freedom.
Now that we know the mean number of bees is 16,838, you only need 99 of the bee counts to figure out what the count for the 100th square mile would bee. With a quick one sample t-test, we get our p-value from a t-distribution with 99 degrees of freedom (the black line). If we had less data, say 6 data points, we’d only have 5 degrees of freedom which will give us a slightly different t-distribution with fatter tails (the blue line), and therefore a different p-value.
Our p-value of 0.001 tells us that we reject the null that the mean number of bees per square mile is 15,000. And we couldn’t find that p-value without knowing our degrees of freedom, because as we mentioned in a previous episode, t-distributions get more and more like a normal distribution as we get more and more independent information...aka degrees of freedom. In fact, it looks like the number of bees may be higher than it was previously.
Go bees! One thing to note, though: the 1,838 bee increase is statistically significant, but that just means that if the true bee count per square mile was 15,000 then it’s unlikely that we’d get a sample mean of 16,838. But it doesn’t mean that this difference is practically significant, or all that useful.
An increase of 1,838 bees isn’t really that big compared to the standard deviation, 5,420. If on average, we expect bee counts to vary 5,420 bees from the mean, then a change of 1,838 may not be that important to us. For example, say that we treated half the orchard with a bee pheromone...which bees love...and is thought to encourage them to come back.
Our statistical test on the difference between a group of bees exposed to the pheromone and a group not exposed revealed that there was a statistically significant difference of 3,297 bees per square mile between the pheromone and non pheromone groups. But we still need to ask whether a difference of 3,297 bees is useful to the orchard owner? Those pheromones are pricey.
And she wants to make sure that they’re worth it. That 3,297 bee per square mile difference is an increase of about 0.6 standard deviations. Remember that almost ALL of the data is within 2 standard deviations of the mean.
So a difference of a little more than half a standard deviation is a big deal.. Maybe those pheromones are worth it. Sometimes statistical significance doesn't give us the whole picture.
You probably already use this kind of reasoning in your real life. Like when you're scrolling through your Instagram feed and see a former Bachelor contestant promoting a hair vitamin. A little Googling tells you that yes, this vitamin does cause a statistically significant increase in hair growth, but only a few nanometers.
Your hair normally grows about 12.7 millimeters a month plus or minus a millimeter. So, this vitamin has what we call a small effect size. Effect size tells us how big the effect we observed was, compared to random variation.
It’s really important to pair our p-values with effect sizes, because sometimes, we can get statistically significant effects, but effect sizes that are so small, they don’t really matter to us. Let’s look at an educational supplement called WOWZERBRAIN!. The creators of WOWZERBRAIN! do an experiment.
They bring 90 kids into their center and randomly assign half of them to get the WOWZERBRAIN! supplemental materials, and the other half as a control group. The control reads age appropriate books for the same amount of time that it takes to go through a WOWZERBRAIN! lesson. Once the data is collected, the WOWZERBRAIN! creators take a look at their data and find out that the kids who took part in the WOWZERBRAIN! intervention had a mean reading score improvement of about 1.329 points and the control group improved an average of 1.265 points.
The first things the WOWZERBRAIN! researchers do is perform a two sample t-test, and find a t-value of -0.21. And a p-value 0.8 -- calculated using a t-distribution with 88 degrees of freedom. So they weren’t able to reject the null.
Their effect size - substituted into our equation is only about 0.044, which is pretty small. That means that the kids that got WOWZERBRAIN! materials only had scores that were higher by about 1/23rd of the amount we expect students to vary just by chance. But despite the null result of their t-test, the WOWZERBRAIN! creators look at the raw numbers and see that the kids who got WOWZERBRAIN! did score numerically higher, even though it wasn’t statistically significant.
So they, like many researchers and scientists, think to themselves that maybe the reason that the t-test wasn’t significant was because they ran an underpowered experiment... with too small of a sample size. Since standard error is scaled by the square root of n then--all things equal--the larger our sample size, the smaller our standard error and the larger our t-statistic will be. So, the researchers wonder whether they could detect an effect if they tested 10,000 children.
And sure enough, with 10,000 kids, they got a t-value of -2.218, with a p-value of 0.02886. Which is small enough to reject the null hypothesis! But notice that their effect size is still the same...about 0.044.
So the intensive WOWZERBRAIN! intervention, still only helped improve average reading scores by 0.064 points. P-values, as you can see, aren’t everything. They should always be looked at in the context of other measures, like effect sizes.
P-values tell us whether it’s likely something happened by chance alone. Effect sizes help us figure out whether observed effects are practically significant to us. In this case, though the WOWZERBRAIN! creators achieved statistical significance, for many people they may have failed to achieve practical significance.
Parents are unlikely to pay for a year round educational program that only improves test scores by 0.064 points. We talk a lot about p-values, and that’s because lots of people use them to do really important things. But they can’t stand alone.
P-values are PART of the whole picture and should be paired with other information, like an effect size. It’s like trying to buy an apartment based on cost per square foot alone. Sure, maybe you find something for 75 cents per square foot….but it turns out it’s right next to the city dump...so maybe you’ll pass on that one… And we need degrees of freedom to understand why smaller differences between means can be significant if you have a larger sample size.
The more information you have, the more accurate your estimates are. It’s why we might not bat an eye at the fact that two people from two different countries have a height difference of 1 foot, but very surprised if those two countries had an average height difference of 1 foot. And that’s about 0.3 meters for you people using the metric system.
Having more accurate information changes the threshold for what’s surprising or significant to us. Thanks for watching. I'll see you next time.
It’s great to have a lot of choices. But sometimes we limit our choices in order to do something productive or meaningful.
Like being on a team project that needs a writer, director, host, camera person, and boom mic holder. If we have 5 different people who can be on that team, after assigning 4 of them positions...the last person doesn’t have any freedom to choose theirs. It has effectively been assigned.
If she’s willing to give up the freedom to have a choice of positions and take on the great feat of upper body strength that is holding a boom mic, then they have a team that can complete their project. This can happen in statistics, too. Occasionally we have to give up some freedom--degrees of freedom--in order to do something useful with our data.
Degrees of freedom are the number of independent pieces of information we have and Degrees of freedom are an important part of many of the models that we use. In fact, we’ve also been leaving out another important component of the t-test: effect size. Knowing what degrees-of-freedom and effect-size are and why they matter will help give our t-tests better context.
INTRO In the last few episodes we’ve covered the general formula for test statistics. And we’ve gotten pretty good at calculating t-statistics for all sorts of situations: means, proportions, one sample, two sample, paired, unpaired but every time we’ve needed a p-value, we’ve let the computer do the work. Which is what we’ll continue to do.
But it’s important to know that we’re not using the same t-distribution every single time. As we’ve previously discussed, the t-distribution is like the z-distribution, but it has fatter tails, meaning that extreme t-values tend to be slightly more likely. And that’s because we don’t know the population standard deviation when we calculate a t-statistic, so we estimate it using the sample standard deviation.
This little bit of uncertainty means that we don’t have a perfect normal--or z--distribution. Instead we have our fat tailed friend. But with bigger sample sizes, we’re better able to estimate population parameters like the mean and standard deviation, so our t-distribution changes its shape to reflect that.
As n--our sample size--gets bigger, we’re less and less uncertain about our estimate, and the t-distribution will get closer and closer to z. More information usually means we have a more accurate estimate. Degrees of freedom can help us measure that accuracy.
We choose our t-distribution based on the number of degrees of freedom that we have. Degrees of freedom are the number of pieces of independent information in our data. Let’s go to the thought bubble.
After dinner with 2 friends, you all pull out your credit cards to split the bill. Your friend Carmen, who’s a bit of math savant, and a bit of a showoff, notices that if you took your credit card numbers as a single 16 digit number, the mean of your three credit card numbers is 4551-9681-7590-9146. She said this really loudly and you’re a little nervous that an identity thief might have been lurking nearby and overheard Carmen make her very public declaration.
But there’s nothing to worry about! Even though a potential thief has the mean of your credit card numbers, they won’t be able to figure out what any of your individual numbers are. In other words, there’s a lot of “freedom” around what those numbers could be.
And actually, you’d even be okay if the thief found out Carmen‘s credit card number. At that point, they could figure out the sum or mean of your and your other friend Eli‘s cards, but they still couldn’t tell what your exact number was. There’s still freedom for your credit card number to take on different values.
It could be any of these: BUT as soon as someone knows the mean of all three cards, Carmen’s number, and Eli ’s number, they’ll know exactly what your credit card number is. It’s no longer “free” to take on different values. If Carmen’s number is this: And Eli’s number is this: Then knowing the mean allows anyone to figure out that your number must be this: So you should probably make sure that Eli keeps his number under wraps.
Just to be safe. Thanks Thought bubble. In that example, the three credit card numbers already existed before we started doing any math.
And they are three independent pieces of information. Eli’s credit card number has no effect on your credit card number, which has no effect on Carmen’s, and so on. But, as soon as Carmen calculated the mean, she used up one of those independent pieces of information.
Once the thief knows the mean, they only need TWO pieces of independent information. (that is n-1 pieces). In this case, once they know any two of the credit card numbers--and the mean--they know all three. So when they learn Carmen’s number and Eli’s number -- SUDDENLY those numbers can reveal yours.
The thief can figure out your exact credit card number. Since it’s no longer independent of the others. To bring it back to our t-tests... when we calculate a mean, we’re using up one degree of freedom--or one piece of independent information.
The amount of information that we originally have depends on our sample size--n--which is why you’ll often see it in the formulas to calculate degrees of freedom. The more data you have, the more independent information that you have. But every time you make a calculation like a mean, you’re using up one piece of independent information.
So, for example, we have data from 100 randomly sampled square miles of avocado orchard, and we’ve painstakingly counted the number of bees spotted in each sampled square mile over the course of a week. The bee population is declining! We need to be sure avocados are getting pollinated!
The owner of one avocado orchard says that she usually sees 15,000 bees per square mile. So, you set out to analyze your data to see whether you think the bee population has changed. You have 100 pieces of independent data--one measure from each square mile--so, when you calculate the mean number of bees from all 100 square miles, you’re using up 1 degree of freedom.
Now that we know the mean number of bees is 16,838, you only need 99 of the bee counts to figure out what the count for the 100th square mile would bee. With a quick one sample t-test, we get our p-value from a t-distribution with 99 degrees of freedom (the black line). If we had less data, say 6 data points, we’d only have 5 degrees of freedom which will give us a slightly different t-distribution with fatter tails (the blue line), and therefore a different p-value.
Our p-value of 0.001 tells us that we reject the null that the mean number of bees per square mile is 15,000. And we couldn’t find that p-value without knowing our degrees of freedom, because as we mentioned in a previous episode, t-distributions get more and more like a normal distribution as we get more and more independent information...aka degrees of freedom. In fact, it looks like the number of bees may be higher than it was previously.
Go bees! One thing to note, though: the 1,838 bee increase is statistically significant, but that just means that if the true bee count per square mile was 15,000 then it’s unlikely that we’d get a sample mean of 16,838. But it doesn’t mean that this difference is practically significant, or all that useful.
An increase of 1,838 bees isn’t really that big compared to the standard deviation, 5,420. If on average, we expect bee counts to vary 5,420 bees from the mean, then a change of 1,838 may not be that important to us. For example, say that we treated half the orchard with a bee pheromone...which bees love...and is thought to encourage them to come back.
Our statistical test on the difference between a group of bees exposed to the pheromone and a group not exposed revealed that there was a statistically significant difference of 3,297 bees per square mile between the pheromone and non pheromone groups. But we still need to ask whether a difference of 3,297 bees is useful to the orchard owner? Those pheromones are pricey.
And she wants to make sure that they’re worth it. That 3,297 bee per square mile difference is an increase of about 0.6 standard deviations. Remember that almost ALL of the data is within 2 standard deviations of the mean.
So a difference of a little more than half a standard deviation is a big deal.. Maybe those pheromones are worth it. Sometimes statistical significance doesn't give us the whole picture.
You probably already use this kind of reasoning in your real life. Like when you're scrolling through your Instagram feed and see a former Bachelor contestant promoting a hair vitamin. A little Googling tells you that yes, this vitamin does cause a statistically significant increase in hair growth, but only a few nanometers.
Your hair normally grows about 12.7 millimeters a month plus or minus a millimeter. So, this vitamin has what we call a small effect size. Effect size tells us how big the effect we observed was, compared to random variation.
It’s really important to pair our p-values with effect sizes, because sometimes, we can get statistically significant effects, but effect sizes that are so small, they don’t really matter to us. Let’s look at an educational supplement called WOWZERBRAIN!. The creators of WOWZERBRAIN! do an experiment.
They bring 90 kids into their center and randomly assign half of them to get the WOWZERBRAIN! supplemental materials, and the other half as a control group. The control reads age appropriate books for the same amount of time that it takes to go through a WOWZERBRAIN! lesson. Once the data is collected, the WOWZERBRAIN! creators take a look at their data and find out that the kids who took part in the WOWZERBRAIN! intervention had a mean reading score improvement of about 1.329 points and the control group improved an average of 1.265 points.
The first things the WOWZERBRAIN! researchers do is perform a two sample t-test, and find a t-value of -0.21. And a p-value 0.8 -- calculated using a t-distribution with 88 degrees of freedom. So they weren’t able to reject the null.
Their effect size - substituted into our equation is only about 0.044, which is pretty small. That means that the kids that got WOWZERBRAIN! materials only had scores that were higher by about 1/23rd of the amount we expect students to vary just by chance. But despite the null result of their t-test, the WOWZERBRAIN! creators look at the raw numbers and see that the kids who got WOWZERBRAIN! did score numerically higher, even though it wasn’t statistically significant.
So they, like many researchers and scientists, think to themselves that maybe the reason that the t-test wasn’t significant was because they ran an underpowered experiment... with too small of a sample size. Since standard error is scaled by the square root of n then--all things equal--the larger our sample size, the smaller our standard error and the larger our t-statistic will be. So, the researchers wonder whether they could detect an effect if they tested 10,000 children.
And sure enough, with 10,000 kids, they got a t-value of -2.218, with a p-value of 0.02886. Which is small enough to reject the null hypothesis! But notice that their effect size is still the same...about 0.044.
So the intensive WOWZERBRAIN! intervention, still only helped improve average reading scores by 0.064 points. P-values, as you can see, aren’t everything. They should always be looked at in the context of other measures, like effect sizes.
P-values tell us whether it’s likely something happened by chance alone. Effect sizes help us figure out whether observed effects are practically significant to us. In this case, though the WOWZERBRAIN! creators achieved statistical significance, for many people they may have failed to achieve practical significance.
Parents are unlikely to pay for a year round educational program that only improves test scores by 0.064 points. We talk a lot about p-values, and that’s because lots of people use them to do really important things. But they can’t stand alone.
P-values are PART of the whole picture and should be paired with other information, like an effect size. It’s like trying to buy an apartment based on cost per square foot alone. Sure, maybe you find something for 75 cents per square foot….but it turns out it’s right next to the city dump...so maybe you’ll pass on that one… And we need degrees of freedom to understand why smaller differences between means can be significant if you have a larger sample size.
The more information you have, the more accurate your estimates are. It’s why we might not bat an eye at the fact that two people from two different countries have a height difference of 1 foot, but very surprised if those two countries had an average height difference of 1 foot. And that’s about 0.3 meters for you people using the metric system.
Having more accurate information changes the threshold for what’s surprising or significant to us. Thanks for watching. I'll see you next time.