Previous: What Can You Learn from Your Competition?: Crash Course Business Entrepreneurship #4
Next: The Enlightenment: Crash Course European History #18



View count:170,052
Last sync:2024-04-12 21:30


Citation formatting is not guaranteed to be accurate.
MLA Full: "How to make an AI read your handwriting (LAB) : Crash Course Ai #5." YouTube, uploaded by CrashCourse, 6 September 2019,
MLA Inline: (CrashCourse, 2019)
APA Full: CrashCourse. (2019, September 6). How to make an AI read your handwriting (LAB) : Crash Course Ai #5 [Video]. YouTube.
APA Inline: (CrashCourse, 2019)
Chicago Full: CrashCourse, "How to make an AI read your handwriting (LAB) : Crash Course Ai #5.", September 6, 2019, YouTube, 17:17,
Follow along:
John Green Bot wrote his first novel! Today, in our first ever Lab we’re going to program a neural network to recognize handwritten letters to convert the first part of John Green Bot’s novel into typed text. To do this we’re going to import a labeled dataset, called EMNIST, we’ll use a pre-written library called SKLearn to build the network, we’ll train and tweak our code until it’s accurate (enough), and then we’ll use our newly trained network to convert John Green Bot’s handwritten pages.

We created this project in a way that you don’t have to install anything on your computer, the only thing you’ll need to get started is a Google account and a sense of adventure! To run the Colaboratory file (link at the top) you’ll have to click “open in playground” at the top of the page OR open the File Menu and click “Save a Copy to Drive.” From there you can change, tweak, and edit the code as you wish. We also left text around and within the code to help you along the way. If you use this code in your own project let us know in the comments!

If you want the raw data we used for the project you can download our files from GitHub here:

EMNIST paper:

Crash Course AI is produced in association with PBS Digital Studios

#CrashCourse #ArtificialIntelligence #MachineLearning

Crash Course is on Patreon! You can support us directly by signing up at

Thanks to the following patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:

Eric Prestemon, Sam Buck, Mark Brouwer, Indika Siriwardena, Avi Yashchin, Timothy J Kwist, Brian Thomas Gossett, Haixiang N/A Liu, Jonathan Zbikowski, Siobhan Sabino, Zach Van Stanley, Jennifer Killen, Nathan Catchings, Brandon Westmoreland, dorsey, Kenneth F Penttinen, Trevin Beattie, Erika & Alexa Saur, Justin Zingsheim, Jessica Wode, Tom Trval, Jason Saslow, Nathan Taylor, Khaled El Shalakany, SR Foxley, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, David Noe, Shawn Arnold, Andrei Krishkevich, Rachel Bright, Jirat, Ian Dundore

Want to find Crash Course elsewhere on the internet?
Facebook -
Twitter -
Tumblr -
Support Crash Course on Patreon:

CC Kids:

 (00:00) to (02:00)

Jabril: Oh, this is it.  Perfect.  I think these extra layers are gonna make it so much better.  Oh yeah, increasing the size of this layer was a really good idea.  Alright, okay, I can't wait any longer.  It's time to test it.  

John Green-bot: Jabril, Jabril, I wrote a novel.  

J: Whoa, John Green-bot, you did what?

JGB: I wrote a novel.  

J: A novel?  Oh.  Let me see this.  Wow, John Green-bot, this is pretty sloppy.  We need to work on your handwriting.  Hold up, hold up, you wrote one letter per page?  This is impossible to read.  John Green-bot, we've got to get your novel an audience, so let's digitize this using machine learning, but first, there's something else we have to test.  


Welcome back to Crash Course AI.  I'm your host, Jabril, and today we'll be doing something a little different.  This is the first time we're trying a hands-on lab on Crash Course, so we'll tackle a project together and program a neural network to recognize handwritten letters.  Alright, John Green-bot, we'll get back to you when we've got something.  

We'll be writing all of our code using a language called Python and a tool called Google Collaboratory.  You can see the code we're about to go over in your browser from the link we put in the description, and you can follow along with me in this video.  In these Collaboratory files, there's some regular text explaining what we're trying to do, and pieces of code that we can run by pushing the play button.  These pieces of code build on each other, so keep in mind that we have to run them in order from top to bottom.  Otherwise, we might get an error.  To actually run the code and experiment with changing it, you have to either click open in playground at the top of the page, or open the file menu and click save a copy to drive, and just FYI, you'll need a Google account for this.

 (02:00) to (04:00)

Remember: our goal is to program a neural network to recognize handwritten letters and convert them to typed text.  Even though this stack of papers is unreadable to me, we can work with it, and it could actually make our project a little easier.  Usually with a project like this, we'd have to write code to figure out where one letter ends and another begins, because handwriting can be messy and uneven.  That's called the segmentation problem.  

But because John Green-bot wrote his novel like this, the letters are already segmented and we can just focus on recognizing the letter on each page.  By the way, avoiding the segmentation problem is also why official forms sometimes have little boxes for each letter instead of just a line for writing your name.  Even though we don't have to worry about segmentation, recognizing handwritten letters and converting them to typed text is still tricky.  Every handwritten 'J' looks a little different, so we need to program our neural network to recognize a pattern instead of memorizing a specific shape, but before we do this, let's think about what we need to get there.

Neural networks need a lot of labeled data to learn what each letter generally looks like, so step one is find or create a labeled dataset to train our neural network, and this involves splitting our dataset into the training set and the testing set.  The training set is used to train the neural network, and the testing set has data that's kept hidden from the neural network during training, so it can be used to check the network's accuracy.  

Next is step two: create a neural network.  We'll actually need to configure an AI with an input layer, some number of hidden layers, and the ability to output a number corresonding to its letter prediction.  In step three, we'll train, test, and tweak our code until we feel that it's accurate enough, and finally, in step four, we'll scan John Green-bot's handwritten pages and use our newly trained neural network to convert them into typed text.  

 (04:00) to (06:00)

Alright, let's get started.  Step 1: creating a labeled dataset can be a huge and expensive challenge, especially if I have to handwrite and label thousands of images of letters by myself.  Luckily, there's already a dataset that we can use: the Extended Modified National Institute of Standards and Technology dataset, or EMNIST for short.  This dataset has tens of thousands of labeled images of handwritten letters and numbers, generated from US census forms.  Some of the handwriting is relatively neat and some, not so much.  We're gonna use the EMNIST letters chunk of the dataset, which has 145,600 images of letters, because we're only recognizing letters in John Green-bot's book, not numbers.

This code here will give our program access to this dataset, also called importing it.  So now, we need to make sure to keep our training and testing datasets separate, so that when we test for accuracy, our AI has never seen the testing images before.  So now in our code at step 1.2, let's call the first 60,000 labeled images 'train' and the next 10,000 labeled images 'test'.  These images of letters are 28x28 pixels, and each pixel is a greyscale value between 0 and 255.  To normalize each pixel value and make them easier for the neural network to process, we'll divide each value by 255.  That will give us a number between 0 and 1 for each pixel in each image.

Performing a transformation like this to make the data easier to process is a machine learning method called preprocessing.  By the way, we'll need different preprocessing steps for different types of data.  Alright, it may take a few seconds to download and process all the images, so while that's happening, I want to clarify that EMNIS is a luxury.  There aren't many already-existing datasets where you have this much labeled data to use.  In general, if we try and solve other problems, we have to think hard about how to collect and label data for training and testing our networks.

Data collection is a very important step to training a good neural network.  In this case, though, we've got plenty to use in both sets.  

 (06:00) to (08:00)

Okay, let's write a little piece of code to make sure that we imported our dataset correctly.  This line lets us display an image and will also display the label using the print command.  See, this letter is labeled as a 'Y'.  We can display a different example by changing this index number, which tells our program which letter image in the EMNIS dataset to pull.  Let's look at the image index at 1200.  This is labeled as a 'W'.  These are already labeled images.  There's no neural network making any decisions yet, but this is a labeled dataset, so we're done with the first step.

Step two, now that we have our dataset, we need to actually build a neural network, but we don't need to reinvent the wheel here.  We're going to stick to a multi-layer perceptron neural network, or MLP for sure, which is the kind we focused on in the neural networks and deep learning episodes.  There are already some tools in Python called Libraries that we can use to make the network.  We're going to use a library called SKLearn which is short for SciKit Learn.  We'll import that so we have access to it.  SKLearn includes a bunch of different machine learning algorithms and we'll be using its multilayer perceptron algorithm in this lab.

So, our neural network is gonna have images of handwritten letters as inputs.  Each image from the EMNIS is 28x28 pixels and each of these pixels will be represented by a single input neuron.  So, we'll have 784 input neurons in total.  Depending on how dark a particular pixel is, it will have a greyscale value between 0 and 1, thanks to the processing we did earlier.  The size of our output layer depends on the number of label types that we want our neural network to guess.  Since we're trying to guess letters and there are 26 letters in the English alphabet, we'll have 26 output neurons.  We don't actually have to tell a network this, though.  It will figure this out on its own from the labels in the training set.

 (08:00) to (10:00)

For the structure of the hidden layers, we'll just start experimenting to see what works.  We can always change it later.  So we'll try a single hidden layer containing 50 neurons.  Over the span of one epoch of training this neural network, each of the 60,000 images in the training dataset will be processed by the input neurons.  The hidden layer neurons will randomly pick some aspect of each image to focus on, and the output neurons will hold the best guess as to whether each image is a particular letter. 

You'll see that the code in our Collab notebook calls this an iteration.  In the specific algorithm we're using, an iteration and an epoch are the same thing.

After each of the 60,000 images are processed, the network will compare its guess to the actual label and update weights and biases to give a better guess for the next image, and after multiple epochs of the same training dataset, the neural network's prediction should keep getting better, thanks to those updated weights and biases.  We'll just go with 20 epochs for now.

We've captured all that in a single line of code in step 2.1, which creates a neural network with a single hidden layer with 50 neurons that will be trained over 20 epochs.  This is why libraries can be so useful.  We're accessing decades of research with just one line of code, but keep in mind, there are cons to using libraries like this as well.  We don't have a lot of control over what's happening under the hood here.  When solving most problems, we'll want to do a mix of using the existing libraries and writing our own AI algorithms.  

So, we would need a lot more than just one line of code.  For this lab, though, step 2 is done.

Step 3.  Next, we want to actually train our network over those 20 epochs and see how well it guesses the letters in the training and testing datasets, with this one line of code in step 3.1.  For every epoch, our program prints a number called the error of the loss function.  This basically represents how wrong the network was overall.  

 (10:00) to (12:00)

We want to see this number going down with each epoch.  The number that we really care about is how well the network does on the testing dataset, which shows how good our network is at dealing with data its never seen before, and we have 84% correct.  Now, that's not bad, considering we only trained for 20 epochs, but we still want to improve it.  To see where the network made most of its mistakes, we can create a confusion matrix, which we made in step 3.2.  

The color of each cell in the confusion matrix represents the number of elements in that cell, and a brighter color means more elements.  The rows are the correct values, and the columns are the predicted values and the numbers on the axes represent the 26 letters in the alphabet, so 0 is A and 1 is B, etc, etc.  So cell (0,0) represents the number of times that our network correctly predicted that an A is an A.  It's good to see a bright diagonal line, because those are all the correct values, but other bright cells are mislabeled, so we should check if there are any patterns.  For example, I and L may be easy to confuse, so let's look at some cases where that happened.

We can also try other types of errors, like every time our network guesses that a U is a V, 37 times.  To see if we can improve our accuracy, we can program a slightly different neural network.  More epochs, more hidden layers, and more neurons in the hidden layers could all help, but the tradeoff is that things will be a bit slower.  We can play around with the structure here to see what happens.  For now, let's try creating a neural network that has five hidden layers of 100 neurons each and we'll train it over 50 epochs.  It'll take a few minutes to run.  

Now we've got better accuracy rates on our testing dataset.  We got 88% correct instead of 84%, and that's an improvement.  Over time, we can develop an intuition about how to construct neural networks to achieve better results.

 (12:00) to (14:00)

See if you can create a network that has a higher accuracy than ours on the testing dataset, but for now, we're gonna forward with this trained network.  Step 4.  This final step is our moment of truth.  We're gonna use our trained neural network to try and read John Green-bot's novel.  So let's dig into this stack of papers.  First, we gotta get our data in the right format by scanning all of these papers.  

And done.  And because we're using Google Collab, we need to get them online.  We're storing them in a GitHub repository, which we coded to import into our Collaboratory notebook, but as you can see, those scanned images are huge, so we've also done a bit of processing on them to avoid having to download and compute over so much data.  We've changed the size of every image to 128x128 pixels.  The other thing you may notice is that the EMNIST dataset uses a dark background with light strokes, but our original scans have a white background with dark strokes, so we also went ahead and inverted the colors to be consistent with EMNIST.  

Alright.  So now, back to the Collab notebook.  So this code right here in step 4.1 will pull the modified letters from GitHub.  Now it'll read them into an array and display one of them, just to make sure we're able to import them correctly.  This looks pretty good.  Clearer than the EMNIST data, actually, but back to the point of why we're doing this in the first place.  Let's see if we can process John Green-bot's story now.  

Uhh, this is not making any sense, so we're doing something wrong.  First off, John Green-bot's story had some empty spaces between words.  We never actually trained our model on empty spaces, just the 26 letters, so it wouldn't be able to detect these, but blank pages should be easy to detect.

 (14:00) to (16:00)

After all, unlike handwritten letters, all blank images should be exactly the same.  So, we'll just check each image and see if it's a blank space and if it is, we'll add a space to our story.  This looks better.  There are separate words and I can tell that the first word is 'the', but not much beyond that.  Something else isn't going right here.  Well, even though the letters on the pages that were scanned look clear to our human eyes, the images were really big compared to the handwritten samples that were used to train EMNIST.  We re-sized them, but that doesn't seem to be enough.  

To help our neural network digitize these letters, we should try processing these images in the same way that EMNIST did.  Let's do a little detective work to figure out how the EMNIST dataset was processed, so our images are more similar to the training dataset and our program's accuracy will hopefully get better.  Hmm.  Further information on the dataset contents and conversion process can be found in the paper.  

We're not gonna go through the paper, but link it in the description if you wanna learn more.  Basically, I made the following additions to the code.  We're applying some filters to the image to soften the letter edges, centering each letter in the square image, and re-sizing each one to be 28x28 pixels.  As part of this code, we're also displaying one letter from these extra processed images to do another check.  

Even though to my eyes, the letters look less clear now, they do look much more similar to the letters in the EMNIST dataset, which is good for a neural network.  The edges of the letters are kind of fuzzy and they're centered in the square.  So let's try processing this story one more time.  Keep in mind, though, that with an 88% accurate model, we expect to get about 1 in 15 letters wrong in the story.

John Green-bot, are you ready?  Alright, let's see what you're talking about.  

The Fault in Our Power Supplies.  I fell in love the way your battery dies, slowly and then all at once.  

 (16:00) to (17:17)

Quite poetic, John Green-bot.  Okay, it's not perfect, but it was pretty easy to figure out with context and by knowing which letters might be mistaken for each other.  Regardless, thanks, John Green-bot, for giving us a little taste of your first novel, and thank you for following along in our first Crash Course Lab.  Let us know in the comments how you think you could improve the code and tell us if you use it in any of your own projects.

Now, this kind of supervised machine learning is a big component of the AI revolution, but it's not the only one.  In later videos, we'll be looking at other types of machine learning including unsupervised and reinforcement learning, to see what we can do even without a giant labeled dataset.  See you then.

Crash Course AI is produced in association with PBS Digital Studios.  If you want to help keep Crash Course free for everyone forever, you can join our community on Patreon, and if you want to learn more about the basics of programming in any language, check out this video from Crash Course Computer Science.  

(PBS Digital Logo)