Previous: Cinema, Radio, and Television: Crash Course History of Science #29
Next: Skyscrapers, Statics, & Dynamics: Crash Course Engineering #26



View count:45,521
Last sync:2023-01-20 16:45
As we near the end of the series, we're going look at how statistics impacts our lives. Today, we're going to discuss how statistics is often used and misused in the courtroom. We're going to focus on three stories in which three huge statistical errors were made: the handwriting analysis of French officer Alfred Dreyfus in 1894, the murder charges of mother Sally Clark in 1998, and the expulsion of student Jonathan Dorfman from UC San Diego in 2011.

Crash Course is on Patreon! You can support us directly by signing up at

Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:

Sam Buck, Mark Brouwer, James Hughes, Kenneth F Penttinen, Trevin Beattie, Satya Ridhima Parvathaneni, Erika & Alexa Saur, Glenn Elliott, Justin Zingsheim, Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, D.A. Noe, Shawn Arnold, Malcolm Callis, Advait Shinde, William McGraw, Andrei Krishkevich, Rachel Bright, Mayumi Maeda, Kathy & Tim Philip, Jirat, Ian Dundore

Want to find Crash Course elsewhere on the internet?
Facebook -
Twitter -
Tumblr -
Support Crash Course on Patreon:

CC Kids:
Hi, I’m Adriene Hill, and welcome back to Crash Course Statistics.

And sadly, we’re nearing the end of this course. We’ve covered a LOT of topics.

From probability, to t-tests, to Machine Learning to Bayesian statistics. Today, we’re going to go more “Real World.” We’re going to talk about how statistics is used in the courtroom to make pretty important decisions. In particular, we’re going to look at 3 individuals whose lives changed by the use of statistics: Alfred Dreyfus, Sally Clark, and Jonathan Dorfman.

[Crash Course theme]

In 1894, a political scandal began in France that would last until 1906 when Alfred Dreyfus--a Jewish Officer in the French army--was convicted of treason. His conviction hinged on an unsigned letter--referred to as the bordereau. The letter offered French military secrets for sale, and the courts decided that it was Dreyfus who had written it.

Dreyfus was convicted, and for a time, sent to live in a prison cell at Devil’s Island. Which is about as pleasant as the name suggests.

This story is complicated. Many books have been written about it. But we’re going to focus in particular on a handwriting analysis done by Alphonse Bertillon--the founder of the first crime laboratory for police in France.

Bertillon alleged that Dreyfus purposely made the bordereau look like a forgery of his own handwriting. That way, if he ever got caught, he could claim someone had tried to frame him. Bertillon theorized that Dreyfus created the bordereau by tracing words and letters from various sources, including one in his own handwriting and his families’ handwriting. That would make the document look more like a forgery than something he’d actually written.

One of the words Bertillon believed got traced over and over was “intérêt” in Dreyfus’s brother’s handwriting. He’d found a letter written by the brother and thought that the word in that letter looked similar to some words in the bordereau. The book "Math on Trial: How Numbers Get Used and Abused in the Courtroom," breaks down his process from here.

“Intérêt” contains five of the most commonly used letters in the French language -- e, n, r, i, and t. So it would have been a logical one to pick for repeated tracing. Bertillon created a key to test his theory. He traced “intérêt” in the brother’s handwriting over and over with no spaces on a line. This is how he imagined Dreyfus would have created a key of his own.

Conveniently, the bordereau was slightly transparent, so Bertillon could place his key underneath it, then look for places where letters overlapped. And yes, there was some overlap, but the majority of letters didn’t. Then, he moved his key over a bit and saw some new overlaps. (And again, a lot of non-overlap).

But, he got excited by the overlapping letters that he HAD seen, so he decided to make two keys and put them both underneath the bordereau with a little distance between them. Then he counted up how many times the letters e,n,r, and t overlapped. And Bertillon concluded that the letters lined up MUCH more than should be expected by chance. Which...makes some sense. I mean he had two keys.

Bertillon used the frequencies of these letters in the bordereau to come up with the expected frequencies of the overlaps. So, the bordereau contained about 800 letters and there were about 60 r’s. If his key was just a bunch of R's, then every r in the bordereau would line up with an r on the key. But since only 1 out of 7 letters in the key were r’s, he expected 1 out of every 7 R's in the bordereau to overlap with the key by chance. So, he expected the letter r to line up 9 times. But it actually lined up 20 times. All the other letters--with the exception of ê--matched up more times than he expected.

But this logic was suspect. A group of famous French mathematicians was asked to inspect Bertillon’s “analysis.” It didn’t hold up. They went as far as to call it “completely unfortunate”...which is about as mean as French mathematicians got back then.

The expected probabilities that Bertillon found may have been relatively accurate--but only when he overlapped the key and the bordereau ONE time. Bertillon had overlapped them TWICE. Which means he counted the overlapping letters BOTH in the regular position (red) AND the offset position (green). And it was the rebuttal by Three French mathematicians helped finally free Dreyfus.

Our next story starts in England in 1996 when Sally Clark’s firstborn son Christopher had the sniffles. Doctors told Sally that it was a cold. But later that month, she found Christopher in distress and called an ambulance. He later died.

In 1998, Sally’s second son, Harry, died in a similar way. A doctor’s examination of Harry found things that could be indicative of abuse, such as retinal hemorrhaging. So, an investigation was conducted.

Steve and Sally Clark were arrested for murdering their sons, but Steve was exonerated. During the trial, Dr. Roy Meadow, a well respected pediatrician, said the probability that one of Sally’s children would die from Sudden Infant Death Syndrome, or SIDS, was 1 in 8,543. He then made an incredibly consequential probability mistake by declaring that the probability of TWO of these deaths happening in the same family was 1 in 73 million.

Sally’s lawyer confronted Dr. Meadow on whether the probability of Harry’s death was still 1 in 8,543, even though his brother had earlier died with the same diagnosis. Essentially, he was asking whether these two events were independent, or if having one SIDS related death in a family increases the probability of having another. And he was right to ask.

SIDS-related deaths in a family may not be independent of each other. We don’t know the cause of SIDS, but if it’s even in part genetic or environmental, it could be possible for babies in the same family to have similar risks. If Meadow had done the calculation with this in mind, he would have most likely come to a number MUCH lower than 1 in 73 million.

But that is not the only statistics error that affected the Clark trial. Whether the lawyers suggested it or the jury assumed it, the unspoken assumption was that there was only one other option: Murder. And that since SIDS was so unlikely, murder MUST be more likely. The jury sentenced Clark to life in prison for murdering her sons.

But there was another possibility...the deaths could have been caused by naturally occurring disease or circumstances. And that was found to be the case in Harry’s death. In 2003, Clark’s conviction was overturned by a Court of Appeal because Harry’s medical records indicated that he could have been suffering from a Staph infection, which wasn’t disclosed by a pathologist during the original trial.

When we do statistics, we often only consider one, or possibly two different hypotheses. But it’s important to remember that there are other OPTIONS out there. When possible, we should try to consider those other options. Otherwise, we can accidentally commit this Prosecutor’s fallacy, which incorrectly claims that since the evidence found is SO unusual or unlikely, that the jury could assume that the accused defendant isn’t innocent. And secondly, the fact that one hypothesis is unlikely does not mean that another must be more likely.

Fast forward to 2011, a student at the University of California, San Diego, Jonathan Dorfman, was accused of cheating on a midterm by his Chemistry professor. After two Academic Integrity Review Board hearings, Dorfman was expelled from UCSD, because he had a previous incident on his record. He then filed a lawsuit against the school.

After the midterm, the professor had noticed that Dorfman had changed the Test Version on his answer sheet. There were four versions of the test -- A, B, C, and D. Students were told that if their test version wasn’t the same as the letter on their answer sheet, they should tell their test proctor. Dorfman HAD changed the letter on his answer sheet from D to A. But, he said that he arrived late, so he didn’t hear those instructions. He just saw that his test booklet had a different version than his answer sheet, and changed the answer sheet to match the test.

After looking at all the exams, the professor also noticed that Dorfman’s test matched another student--they called them Student X--who had test version A, the same one Dorfman claimed to have. 24 of the 26 answers matched between the two exams. 8 of the 10 incorrectly answered questions, and all 16 of the correct ones.

The professor of the class took this as further proof that Dorfman was cheating, and even went as far as to get a statistician to say that the probability of those same eight wrong answers happening by chance was a billion to one. Though the court documents do not reveal the exact math that the statistician used, it seems possible that they, like Dr. Meadow in Sally Clark’s case, probably assumed independence when no such assumption should be made. The wrong answers that students choose aren’t always totally random. On multiple choice tests, many times there’s one answer that looks good, even if it’s slightly inaccurate.

And since these students were all taking the same course, and reading the same textbook, their incorrect answers aren’t independent of each other. Their misconceptions of the material were VERY likely to be dependent on their learning environment and therefore related. Dorfman’s lawyer displayed this, during the second review board hearing, by showing 44 of the 618 students who took this test (in all its versions) had 23-25 answers matching Dorfman’s.

So this one in a billion stat that was presented to the court can be misleading. To most people, the stat seems believable at first glance. And much of the University’s initial argument relied on it. So much so, that they refused to identify Student X--off of whom Dorfman allegedly cheated. The evidence, they claimed, was so strong that further information about Student X wasn’t required for the case to move forward.

The court determined that by not identifying Student X, the university had given Dorfman an unfair disadvantage in his hearings. By not showing who Student X was there was no way to prove that they weren’t cooperating or that they weren’t sitting next to each other. More information about Student X would allow us to update our 1 in a billion chance with new information, much like we update beliefs in Bayesian Statistics. Taken alone, the probabilities can be misleading. The Court ruled in favor of Dorman, saying that if UCSD ever wanted to bring more charges against him, that they’d need to identify Student X.

Many cases are influenced by poor probability calculations. Understanding basic rules about probability and statistics, and being skeptical of the probabilities you may hear can have a huge impact on whether or not you come to the right conclusions. Thanks for watching, I’ll see you next time.

[Crash Course credits music]

Crash Course Statistics is filmed in the Chad and Stacy Emigholz Studio in Indianapolis, Indiana, and it's made with the help of all these nice people. Our animation team is Thought Café. If you'd like to keep Crash Course free for everyone forever, you can support the series at Patreon: a crowdfunding platform that allows you to support the content you love. Thank you to all our patrons for your continuing support.

Crash Course is a production of Complexly. If you like content designed to get you thinking, check out some of our other channels at Thanks for watching.