healthcare triage
Test Characteristics: How Accurate was that Test?
YouTube: | https://youtube.com/watch?v=UF1T7KzRnrs |
Previous: | Healthcare Triage Questions #1 |
Next: | The Bayes Theorem: What Are the Odds? |
Categories
Statistics
View count: | 112,404 |
Likes: | 2,519 |
Comments: | 169 |
Duration: | 06:49 |
Uploaded: | 2014-04-07 |
Last sync: | 2025-01-22 19:30 |
The Vermont Study: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3611728/
John Green -- Executive Producer
Stan Muller -- Director, Producer
Aaron Carroll -- Writer
Mark Olsen - Graphics
http://www.twitter.com/aaronecarroll
http://www.twitter.com/crashcoursestan
http://www.twitter.com/realjohngreen
http://www.twitter.com/olsenvideo
People assume that when their doctor orders a test for them that the results are easy to interpret. The reality is that this is far from true. Every test – be it radiological or laboratory – is imperfect. Sometimes the tests miss something that’s concerning. Sometimes they pick up stuff that isn’t. So, what makes a good test or a bad test? That’s the topic of this week’s Healthcare Triage.
Let’s say there’s a disease out there called Fakitis, but there’s no test for it. So, an enterprising researcher wants to figure one out. He thinks that Fakitis will raise your white blood cell count, which is really just how many white blood cells you can see under a microscope set at a certain power. So, he gathers a thousand people - with and without Fakitis - and he draws their blood.
As you can imagine, there are a range of white blood cell counts in this population. So he has to set an amount that he will call a positive result – let’s say it’s fifteen. Anyone with a white blood cell count of fifteen or higher is positive, and anyone less is negative.
What this, or any researcher, would do is set up a table like so. Every participant falls under one of these four boxes. Patients in box A have Fakitis, and also have a positive test. Patients in box B do not have Fakitis, but also have a positive test. Patients in box C have Fakitis, but have a negative test, and patients in box D do not have Fakitis, and also have a negative test.
Box A contains what we call the “true positives.” Box D contains the “true negatives.” These are both good results, and in an ideal world, all of the patients would fall under boxes A and D. But that almost never happens.
Some patients wind up in box B, which we call “false positives.” They don’t have the disease, but they do have a positive test. And some patients wind up in box C, which we call “false negatives.” They have the disease, but the test doesn’t pick it up.
We use these ideas to calculate two test characteristics: sensitivity and specificity.
Sensitivity is the proportion of people who have a disease who have a positive test result. It’s the ratio of true positives to true positives plus false negatives. If you follow our diagram, sensitivity equals A over A plus C.
Specificity is the proportion of patients who don’t have the disease who have a negative test. It’s the ratio of true negatives to true negatives plus false positives. If you refer to our diagram, specificity equals D over D plus B.
Let’s fill in the diagram with some real numbers. The researchers gathered a thousand people. It turns out that a hundred of them have Fakitis. Of these people, 90 have a positive test and 10 have a negative test. There are 900 people without the disease, and of them, 750 had a negative test, and 150 have a positive test.
Sensitivity, remember, is the proportion of people who have Fakitis who have a positive test. A hundred people total have Fakitis. 90 of them have a positive test. So the sensitivity is ninety over a hundred, or 90 percent.
Specificity, remember, is the proportion of people who don’t have Fakitis who have a negative test. Nine hundred people don’t have Fakitis, seven hundred fifty of them have a negative test, so that the specificity is seven fifty over nine hundred, or 83 percent.
In an ideal world, both of these would be a hundred percent - the higher the better. That almost never, ever happens.
So, setting the threshold for a positive result at a white blood cell count of fifteen led to a sensitivity of 90% and a specificity of 83%. Is that good? Well, it depends what you want to get out of the test.
Let’s say Fakitis is a really, really bad disease for which we have a treatment that’s pretty easy to tolerate. If that’s the case, we’d much rather make sure we don’t miss any real cases of Fakitis. Looking back at the diagram, we want to minimize the number of false negatives. We want everyone who is disease-positive to be test-positive. So, we could drop the threshold of the white blood cell count to, perhaps, twelve instead of fifteen. That might change the results in this way.
Now our sensitivity is ninety-nine over a hundred, or 99%. Our specificity is five hundred over nine hundred, or 56%. That’s because as you increase sensitivity, you’re going to decrease specificity. Another way of putting it that as you make a test more sensitive, or more able to pick up disease if it’s there, you make it less specific, or less able to prove that a positive result is really real. But, if you have a disease you want to rule out because it’s bad and you don’t want to miss it, you want to maximize sensitivity.
Sometimes a positive diagnosis is a big deal and you really don’t want to get a false one. Think about a pregnancy test: you’re going to figure it out sooner or later anyway, and you really don’t want to freak out tons of women. Or men. So, if Fakitis was like that, we’d actually want to raise the threshold of the white blood cell count to lower the number of false positives. Maybe we’d set it at eighteen instead of fifteen. How might that change things?
Now our sensitivity is sixty over a hundred, or 60%. But our specificity is eight hundred eighty over nine hundred, or 98%. As we made the test more specific, or more able to prove a positive result is real, we made it less sensitive, or less able to detect disease if it’s there.
Tests are rarely very sensitive and very specific. Usually it’s a trade-off, and we need to consider how much real disease we are missing and how many positive tests are over diagnosis. Here’s a real-world situation I pulled from the medical literature from a manuscript published in 2012: The study involved 141,284 people. Of them 728 were disease-positive. The test had a sensitivity of 83.8% and a specificity of 90.6%. Sounds decent, right?
This was a study of mammography in Vermont. And mammography, remember, is considered really important by a lot of people. But mammograms missed 118 of the 728 women with cancer. Is that sensitive enough?
Before you say no, remember that increasing the sensitivity would make it less specific, and even with a specificity of almost 91%, more than 13,000 women had a positive mammogram that turned out not to be cancer. These women likely had other procedures, tons of worry, and they had to spend lots of money. And that’s a lot of women.
The bottom line is that it’s unlikely that most of us think about tests in this way. We’re likely not considering the trade-offs of sensitivity and specificity in judging whether the test is right for us. But every time we get a test, we have to remember it doesn’t give us a definitive answer. It gives us a test result, and the interpretation of that result depends on whether those who designed the test decided to worry more about sensitivity or specificity.
So now that you know about sensitivity and specificity, how can you use them to make better decisions about healthcare? Watch next week and find out.