YouTube: https://youtube.com/watch?v=uJFdLKkuYc4
Previous: Introduction to Crash Course Navigating Digital Information #1
Next: The Future of Clean Energy: Crash Course Engineering #31

Categories

Statistics

View count:61,399
Likes:1,547
Comments:91
Duration:11:20
Uploaded:2019-01-09
Last sync:2024-11-14 08:15

Citation

Citation formatting is not guaranteed to be accurate.
MLA Full: "When Predictions Succeed: Crash Course Statistics #44." YouTube, uploaded by CrashCourse, 9 January 2019, www.youtube.com/watch?v=uJFdLKkuYc4.
MLA Inline: (CrashCourse, 2019)
APA Full: CrashCourse. (2019, January 9). When Predictions Succeed: Crash Course Statistics #44 [Video]. YouTube. https://youtube.com/watch?v=uJFdLKkuYc4
APA Inline: (CrashCourse, 2019)
Chicago Full: CrashCourse, "When Predictions Succeed: Crash Course Statistics #44.", January 9, 2019, YouTube, 11:20,
https://youtube.com/watch?v=uJFdLKkuYc4.
In our series finale, we're going to take a look at some of the times we've used statistics to gaze into our crystal ball, and actually got it right! We'll talk about how stores know what we want to buy (which can sometimes be a good thing), how baseball was changed forever when Paul DePodesta created a record-winning Oakland A's baseball team, and how statistics keeps us safe with the incredible strides we've made in weather forecasting. Statistics are everywhere, and even if you don't remember all the formulae and graphs we've thrown at you in this series, we hope you take with you a better appreciation of the many ways statistics impacts your life, and hopefully we've given your a more math-y perspective on how the world works. Thanks so much for watching DFTBAQ!

Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse

Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:

Eric Prestemon, Sam Buck, Mark Brouwer, Naman Goel, Patrick Wiener II, Nathan Catchings, Efrain R. Pedroza, Brandon Westmoreland, dorsey, Indika Siriwardena, James Hughes, Kenneth F Penttinen, Trevin Beattie, Satya Ridhima Parvathaneni, Erika & Alexa Saur, Glenn Elliott, Justin Zingsheim, Jessica Wode, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Brian Thomas Gossett, Khaled El Shalakany, SR Foxley, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, D.A. Noe, Shawn Arnold, Malcolm Callis, Advait Shinde, William McGraw, Andrei Krishkevich, Rachel Bright, Jirat, Ian Dundore
--

Want to find Crash Course elsewhere on the internet?
Facebook - http://www.facebook.com/YouTubeCrashCourse
Twitter - http://www.twitter.com/TheCrashCourse
Tumblr - http://thecrashcourse.tumblr.com
Support Crash Course on Patreon: http://patreon.com/crashcourse

CC Kids: http://www.youtube.com/crashcoursekids
Hi, I'm Adriene Hill, and welcome back to Crash Course Statistics. In the last episode, we talked about some areas in which we still struggle to make consistently accurate predictions.

But there are also many areas we have done really well. Companies have increasingly improved their use of both customer and outside data to make sure they have the right items in stock. And our ability to predict the weather has improved a ton since the days when people believed deities used the weather to punish us. Statistics has also transformed sports from football fans who use state of the art analytics to come up on top of their fantasy football leagues to soccer, where players shooting penalty kicks have figured out where to aim the ball for the highest chance of scoring. Baseball even has a name for its analytic field: Sabermetrics.

Pretty much everything we've done in this series, from data visualization to chi-square tests, to machine learning and Bayesian hypothesis testing, has led up to this moment, this last episode. Whether we're doing inferential tests, or creating predictive models, we want to make informed decisions, from which medication to take, to which colleges to apply to. And statistics allows us to use inference and prediction to make those decisions.

[Intro]

Let's start with how prediction helps companies and their customers. Walmart has accumulated data on customer demand for different items, and their team discovered some surprising trends, like the fact that wind conditions may have an impact on whether or not customers want to eat berries. They found that people like to eat berries when it's cooler than 80 degrees Fahrenheit or 26.7 degrees Celcius, and there's very little wind. So, they advertise berries more at times like that, when demand is high. They also know that if it's warm and not raining, people are likely to buy steaks. But, if it gets hot, over 90 degrees Fahrenheit or 32.2 degrees Celcius, people buy hamburger. 

Big and small stores alike all want to predict exactly when people will want to buy things. If they can get it right, then they can save money by not having unwanted merchandise taking up warehouse space, and make money by selling stuff. They also won't lose money, because they didn't have enough stock of a popular item. And, customers are happy if there are New York Strip steaks available when they want to eat them. 

One company that has shared a bit about its algorithms is Stitch Fix. It's a style subscription service that sends you clothes to try on and potentially buy. Stitch Fix uses data and statistics in order to make sure that they choose clothes you're more likely to want to keep. And, their model has a lot of moving parts. It uses algorithms not just to stock its warehouses or match me with a blouse, but also to help design clothes. Each dress or pair of pants has a set of attributes (gold, lamé, flared). Stitch Fix also has data on what subscribers like (gold, lamé). To create new styles, they recombine the attributes of existing styles and alter them slightly. Then, they bring the human designers to help out. At least for now.

Alright, gold lamé pants probably aren't the best example of successful use of statistics and algorithms, but the success of statistic and analytics in baseball will not come as surprise to anyone who has seen or read Moneyball. Stats like batting average (which is number of hits divided by number of times at bat) has been around for a long time. But, many of these simpler stats were missing a lot of information about what makes a really good baseball player. In Moneyball, Michael Lewis writes about Bill James, the father of sabermetrics, who believed "the statistics were not merely inadequate; they lied. And the lies they told led people who ran major league baseball teams to misjudge their players, and mismanage their games."

So, in 2001, when the Oakland A's lost 3 of their best players and found themselves with a lack of fund to replace them, manager Billy Beane decided to use statistics to find the best players for the team. Beane and his assistant, the stats savvy Paul DePodesta, looked at how adding individual players to the team could increase the probability of winning games. They calculated more complicated statistics, such as how many walks players had and their on-base average (which is a measure of how often a player reaches a base, whether from a hit, a walk, or by being hit by the pitch). They used data that other teams weren't paying attention to, and as a result, they recruited players that other teams had overlooked.

Beane's attention to statistical details paid off. In the 2002 season, the A's won 20 straight games, a record at the time for their league. This spurred on the popularity of sabermetrics, which is the statistical analysis of players and gameplay in baseball. Sabremertricians use statistics to figure out who to hire, who to trade, and when to pull pitchers from the mound.

Major League Baseball teams use high-def camera and radar to measure pitch release and velocity. They track a baseball's spin rate. They gather data on the angle of the ball when it leaves the bat after it's been hit. And, data shows that a ball hit a little higher is more likely to become a hit or home-run. So, baseball players are now trying to hit the ball higher in the air. 

According to the Washington Post, the average launch angle went up from 10.5 degrees in 2015 to 11.5 degrees in 2016. Or as Dodger Justin Turner put it: "You can't slug by hitting balls on the ground. You have to get the ball in the air if you want to slug, and guys who slug stick around, and guys who don't, don't."

Managers sometimes use statistics when they're deciding when and where players should stand on defense, kinda like when I was at-bat as a kid and everyone ran in five steps. It was embarrassing, whatever.

Since managers have access to data on every player, they can gauge where a ball hit by an opposing batter is most likely to go. Traditionally, the baseball players stand about here, but managers can move them based on the past behavior of the batter. If a player has a tendency to hit the ball to the left side of the field, like data from the Cubs third baseman Kris Bryant showed in 2017 and 2018, managers can move their fielders so they're more concentrated in that area.

This gives the team on defense a better chance of getting the out. And turns out, defenses shift against Bryant specifically, over half the time he's at-bat. A lot of teams do this. Defensive shifting has gone up 5% in the last year. The Houston Astros and the Kansas City Royals shift more than most. The Astros shifted their defense about 37% of the time in 2018, and the Royals shifted 27% of the time, which meant they shifted 1304 more times than they did in 2017.

Sabremerticians aren't the only ones predicting what's gonna happen on the field, meteorologists are using statistics to predict the weather so they can have that big tarp ready when it rains. I love that big tarp. -the noise of a tarp being pulled- Weather has historically seemed unpredictable to humans. In ancient Greek mythology, Zeus controlled the sky, as well as thunder, rain, and lightning, but we've come a long way since then.

In 1870 President Ulysses S. Grant established the weather bureau, now called the National Weather Service in the United States. At first, forecasts were filled with vague uncertainty and had very little precision compared to the hour by hour forecasts we have today. They were also limited in their reach, perhaps only forecasting a day or two, compared to today's ten-day forecasts. Over the years our predictive abilities have improved.

According to Nate Silver: "In 1972, the [National Weather Service's] high-temperature forecast missed by an average of six degrees when made three days in advance. Now it's down to three degrees." Silver also cites the current odds of an American being killed by lightning: 1 in 11,000,000 compared to those odds in 1940: 1 in 400,000. Some of that not being struck by lightning can be attributed to better weather prediction.

Today U.S. meteorologists and weather researchers use a combination of doppler radar, satellites around the planet and facing the sun, radiosondes in weather balloons in the upper stratosphere, and regular old weather stations, and then they crunch all that data with NOAA's weather and climate operational supercomputer system, which is six million times faster than your or my computer. And that allows them to more accurately predict weather events, like rainfall, drought, and hurricane paths. 

About 25 years ago, hurricane path predictions would be off by about 563 Km (350 miles). Now we're off only about 161 Km (100 miles) and scientists likely will keep improving on that. Nate Silver notes in his book, "The Signal and the Noise", that the advance notice we had that hurricane Katrina was gonna hit New Orleans likely saved a lot of people, even though Katrina was still devastating. A few decades ago we might not have known to evacuate as many people as we did.

With better weather prediction, we also have more time to get out of the way of tornadoes and flash floods and severe thunderstorms. We can avoid getting stuck in extreme heat, or extreme cold, and stay off icy roads. It's important that we have continued improvement on a global scale. Being able to predict rainfall and get that data to the right people will be crucial, particularly as temperatures change and the climate shifts.

In recent years, climate scientists have been more able to accurately forecast rainfall in Sub-Saharan Africa, which impacts food from farms that use rain as a water source. But for weather predictions to be useful to as many people as possible, experts recommend that investments are made in data management systems, satellites, and the means to distribute the information to the right people, including rural farmers.

The complexity of the weather data can make it hard to create a best model by hand. Some researchers have begun to use machine learning to help handle all that data. One team at Chapman University used a recurrent neural network to predict droughts in California. They predicted how severe droughts would be, and their model did pretty well. 

Weather is an incredibly noisy phenomenon. There are many factors that affect the temperature, humidity, and other weather events. And the more complex a phenomenon is, the more data we need to accurately predict it. As we've discussed before, neural networks are often better than humans at figuring out patterns in huge amounts of data.

Statistics helps us see how the world works, and hints at how the world could work. It helps us see through uncertainty, but doesn't get rid of that uncertainty. It can show us our biases, but it can also paper over them. Statistics helps us update our beliefs and come up with new ones. 

Even if you don't come away from this series remembering what ANOVA stands for, we hope you take away that the world isn't binary, that it's complicated, sometimes requiring complicated solutions. If you don't remember specifics about P-Values, take away the importance of reading further. Anytime you see a study that you might base a life decision on, see if it makes sense to you. And remember improbable things are likely to happen, just not to you, or to me. Most of us are right in the middle of most of the curves that describe us, and that's okay.

Statistics can show us where we're outliers too. Thanks for watching! DFTBA-Q, Don't forget to be asking questions.

Crash Course Statistics is filmed in The Chad and Stacy Emigholz Studio in Indianapolis Indiana, and it's made with the help of all these nice people. Our animation team is Thought Cafe. If you'd like to keep Crash Course free for everyone forever you can support the series at Patreon, a crowdfunding platform that allows you to support the content you love. Thank you to all our patrons for your continued support. Crash Course is a production of Complexly. If you like content designed to get you thinking, check out some of our other channels at Complexly.com. Thanks for watching.