Previous: Are We Overdue for a Megaquake?
Next: Plastic Bunny 3D Printed From Its Own DNA



View count:105,833
Last sync:2022-11-22 10:30
Thanks to IBM Z for sponsoring this episode. To learn more about the annual student contest, go to: Find out more about IBM Z Machine Learning capabilities here:

Whether you're picking a place to eat or something to watch, machine learning helps us make smarter decisions in our daily lives.

Hosted by: Hank Green

SciShow has a spinoff podcast! It's called SciShow Tangents. Check it out at
Support SciShow by becoming a patron on Patreon:
Huge thanks go to the following Patreon supporters for helping us keep SciShow free for everyone forever:

Kevin Carpentier, Eric Jensen, Matt Curls, Sam Buck, Christopher R Boucher, Avi Yashchin, Adam Brainard, Greg, Alex Hackman, Sam Lutfi, D.A. Noe, Piya Shedden, Scott Satovsky Jr, Charles Southerland, Patrick D. Ashmore, charles george, Kevin Bealer, Chris Peters
Looking for SciShow elsewhere on the internet?

Image Sources:
This episode of SciShow is brought to you by IBM Z.  Learn more about the mainframe, including its performance, scalability, and security capabilities at  


It's no secret that machine learning is all the rage.  It's the idea that we can use data to train a computer to perform a task without having to specifically program in all the steps and it's a large contributor to artificial intelligence.  Many of the flashiest applications of AI like voice assistance or self-driving cars are famous for using neural networks.  These are powerful computer programs designed to loosely mimic the structure of the human brain, but they're not the only important part of machine learning.  

A class of algorithms called decision trees have also quietly lurked behind the scenes, helping scientists and companies sort through data for decades and helping you in ways you might not expect.  Like neural networks, decision trees also emulate humans, but instead of copying the layout of our brain, they mimic the way we reason.  They classify items or situations by checking how their attributes compare to past examples, kind of like playing a guessing game. 

For instance, if you were trying to guess what kind of vehicle I'm thinking of, your first question, if you're any good at this, would probably be something that would eliminate whole categories of answers before moving on to more specific details.  So if you asked "does it have wheels?" and I said "yes", you might follow up with "how many?" and if I said "two", the question, "well, how fast does it?" would help you pick between like a bike and a motorcycle.

Decision trees work the same way.  What makes them so flexible is the wide variety of data they can handle.  Your three questions, for example, included three kinds of data.  "Does it have wheels?" was checking categorical data, not something represented by a number.  Vehicles can only have a whole number of wheels, so that question was  asking about discrete data, while top speed can be any number and is therefore continuous data.  

In day to day life, we deal with that variety of kinds of information really easily and decision trees are designed to do the same thing.  They can also efficiently handle huge amounts of information, which makes them useful in complex situations.  One not-so-surprising application of decision trees you probably run into all the time is the recommendation systems for things like streaming services and restaurant booking apps.  When picking a place to eat, you might ask a friend how much they want to spend or what sort of food they're craving, the exact style of questioning decision trees work with, and their data flexibility means that the system can take those answers and combine them with other stuff the app knows about you, like the fact that you invariably choose trash over something that's probably gonna be good for you.  Sorry.  Speaking from personal experience here.

Recommender systems also have to run a lot.  Just think about how often you fiddle with the little filters while searching to like, limit the travel distance or max price.  Because decision trees rely on such simple logic, they can run faster than other machine learning algorithms with similar capabilites.  Of course, they are not perfect.  As the number of choices grows, the decision tree can get bogged down, a fact that we humans know all too well when scrolling endlessly for just the right movie to go with that takeout, but programmers can build in flexibility, like returning a list of recommendations instead of a single one when the trees get too complex.  That way, they can work fast enough to keep us happy.

At some point in your life, you've probably also had the experience of getting your credit card mysteriously declined while trying to buy something mundane.  The bank was likely trying to stop what they thought was fraud, but if it was really you, it probably like, really slowed down your day.  That's the case 80% of the time and it stinks for the buyer and the bank.  You can't buy the thing you need, or at least not with the card you wanted to use, and your bank, first of all, misses out on making some money from your transaction and second, you might even get annoyed and stop using that card, a situation one 2015 study estimated could cost banks 100 billion dollars a year.  

Part of the problem is that in the past, fraud detection has relied on a set of one-size-fits-all rules.  Like, I'm going to make up a random example.  They might decide that if you make a purchase in two different states within an hour, something's probably wrong, but maybe you were on a road trip or you just live near the border.  This very broad approach is kind of silly, especially since your bank knows an extraordinary amount about you from how often you shop online to what stores you frequent and when you go.

So in a 2018 study, a group of MIT researchers took more than 200 kinds of data collected by credit card companies and created a decision tree to predict fraud.  The algorithm turned all that data into tailored predictors, like how often you travel, so instead of freaking out, the software would just realize, hey, this behavior is actually pretty normal for this person.

When tested on 1.8 million new transactions, the decision tree reduced the error rate of fraud prediction by 54% and as these techniques are integrated on a wider scale, that will mean less fraud for banks to deal with and fewer mysterious problems for the rest of us.  Whether it's finding the right Thai place or protecting your private information, a decision tree is a great option for getting the job done.  They reason like we do and they can do it fast.  Like, think about how many movies are on Netflix, but there's always a recommendation just for you, and yet, they are still simple enough to understand for us mere human beings.

Thanks for watching this episode of SciShow, which was brought to you by IBM Z.  IBM Z is an enterprise computing system built for high speed transaction processing, availability, security, and data privacy capabilities.  You can find out more about the mainframe's reliability, who uses them, and why they're a big deal at and if you're interested in learning even more, you can check out IBM's Master the Mainframe.  It's an annual academic contest and yearlong learning program.  Go to to find out how you can participate.  It's a fun way to experience hands-on mainframe technology with no prior knowledge required.