YouTube: https://youtube.com/watch?v=lo82twBZT8Q
Previous: The Sun Is Green
Next: Planet Powered Protein! #shorts #science #SciShow

Categories

Statistics

View count:152,181
Likes:7,442
Comments:573
Duration:12:05
Uploaded:2023-06-06
Last sync:2024-12-17 11:30

Citation

Citation formatting is not guaranteed to be accurate.
MLA Full: "Why Is ChatGPT Bad At Math?" YouTube, uploaded by SciShow, 6 June 2023, www.youtube.com/watch?v=lo82twBZT8Q.
MLA Inline: (SciShow, 2023)
APA Full: SciShow. (2023, June 6). Why Is ChatGPT Bad At Math? [Video]. YouTube. https://youtube.com/watch?v=lo82twBZT8Q
APA Inline: (SciShow, 2023)
Chicago Full: SciShow, "Why Is ChatGPT Bad At Math?", June 6, 2023, YouTube, 12:05,
https://youtube.com/watch?v=lo82twBZT8Q.
Head to https://linode.com/scishow to get a $100 60-day credit on a new Linode account. Linode offers simple, affordable, and accessible Linux cloud solutions and services.

Sometimes, you ask ChatGPT to do a math problem that an arithmetically-inclined grade schooler can do with ease. And sometimes, ChatGPT can confidently state the wrong answer. It's all due to its nature as a large language model, and the neural networks it uses to interact with us.

Want to hear our ChatGPT dinosaur poem? Check out our patreon at patreon.com/scishow!

Hosted by: Stefan Chin
----------
Support SciShow by becoming a patron on Patreon: https://www.patreon.com/scishow
----------
Huge thanks go to the following Patreon supporters for helping us keep SciShow free for everyone forever: Matt Curls, Alisa Sherbow, Dr. Melvin Sanicas, Harrison Mills, Adam Brainard, Chris Peters, charles george, Piya Shedden, Alex Hackman, Christopher R, Boucher, Jeffrey Mckishen, Ash, Silas Emrys, Eric Jensen, Kevin Bealer, Jason A Saslow, Tom Mosner, Tomás Lagos González, Jacob, Christoph Schwanke, Sam Lutfi, Bryan Cloer
----------
Looking for SciShow elsewhere on the internet?
SciShow Tangents Podcast: https://scishow-tangents.simplecast.com/
TikTok: https://www.tiktok.com/@scishow
Twitter: http://www.twitter.com/scishow
Instagram: http://instagram.com/thescishowFacebook: http://www.facebook.com/scishow

#SciShow #science #education #learning #complexly
----------

Sources:

https://www.youtube.com/watch?v=1I5ZMmrOfnA
https://www.sciencedirect.com/science/article/pii/S0262885607001096?casa_token=Q13niAJrUtgAAAAA:1Fib_lLmB0EH_C7nbqdI_DfepZHwuy3QDKaAX0hiZVQzFfCNYOkwmUqZLB19yU8vS_fBIgPYSxE
https://books.google.co.uk/books?hl=en&lr=&id=3yj_IdO1zPEC&oi=fnd&pg=PT2&dq=scunthorpe+penistone&ots=thw6jucWMd&sig=8teF04MOIDozpOY_Fp_aCSiZzNE&redir_esc=y#v=onepage&q=penistone&f=false
https://intjem.biomedcentral.com/articles/10.1186/s12245-015-0078-z
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6800670/#R25
https://hal.science/hal-03913837v1/preview/ChatGPT.pdf
https://ai.stackexchange.com/questions/38220/why-is-chatgpt-bad-at-math
https://www.britannica.com/technology/computer/The-first-computer
https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/
https://news.mit.edu/2023/large-language-models-in-context-learning-0207
https://www.mdpi.com/2079-9292/10/20/2470
https://cds.cern.ch/record/400313/files/p21
https://www.psychologytoday.com/gb/blog/your-internet-brain/202303/think-chatgpt-is-smart-ask-it-to-do-arithmetic
https://arxiv.org/pdf/2302.03494.pdf
https://arxiv.org/pdf/2301.13867.pdf

Images

https://www.gettyimages.com
https://commons.wikimedia.org/wiki/File:Fulladder.gif
This SciShow video is supported by Linode!

You can get a $100 60-day  credit on a new Linode account at linode.com/scishow. You can ask ChatGPT to write sonnets about dinosaurs on skateboards.

Surely it can handle basic addition. But when I asked it this. ChatGPT returned this.

Which is… Not right. Now, it’s not always wrong, but somehow, humanity has developed a computer program that occasionally screws up grade school math. Much like an actual grade schooler.

And that’s weird, right? I mean, if there’s anything  computers ought to be great at, it’s, you know, computing. But it turns out there’s a very good  reason why ChatGPT is bad at math, and it’s because we’ve spent a lot of time and effort trying to get our computers to  think less like calculators, and more like us. [Intro] To understand how we got here, it helps to take a step back and see how  computers were originally designed to do math.

All modern computers have special  components called arithmetic logic   units, or ALUs, which do all your  behind-the-scenes number-crunching. And the basic building block of an ALU is  a kind of electronic circuit called a logic   gate. Logic gates receive a set of input values…a  series of 1s, 0s, or a mix of both…and then they   apply rigid, logical operations to produce  an output…which is also either a 1 or a 0.

For example, an AND gate  basically asks the question,   “Are both my first and second inputs  1s?” If so, it outputs a 1. And if not,   it outputs a 0. So 1s are basically  stand-ins for “true”, and 0s for “false”.

As simple as it seems, the magic happens  by taking the outputs of logic gates,   and feeding them as inputs into other logic gates. Stringing gates together like this makes  circuits in a computer that can do math. For example, you can create a circuit called  an adder that can add two binary numbers   together.

And by combining adder circuits,  you can handle larger and larger numbers! By setting them up the right way, logic gates  can perform all your favorite grade school math,   and maybe less favorite high school math  that’s ultimately based on grade school math. So long as the numbers are in  a range the ALU can handle,   it’ll perform that math with airtight accuracy.

But ALUs aren’t just used in  calculators, or Google when   you ask if your American friend is right to  complain about it being 20 degrees outside. Anything on your computer involving  calculations or decision making,   like balancing a budget on a spreadsheet  or shuffling songs on a playlist,   involves a series of mathematical  computations performed on an ALU. But those rigid, logical operations can   make working with computers on a  fundamental level pretty tricky.

Say you want to use a computer to constantly  monitor the live-stream of a forest,   and spot any fires that pop up. After all,  humans need things like sleep and bathroom   breaks. So you create an algorithm, or a  set of rules for the computer to follow.

Fires are a noticeably different color than your  typical tree, so one rule you include is something   like “If a pixel is 40% redder than it is on  average, it corresponds to a place on fire.”. These little rules of thumb are sometimes  called heuristics. And while a fairly   sophisticated fire-detection algorithm  might use a whole bunch of heuristics,   it could still fail in unexpected situations.

Like, for example, if a setting Sun  reflects off a lake in the image,   causing the lake to look  over 40% redder than usual… A human could take a look and  easily tell it wasn’t a real fire,   but the computer, running on those very  strict rules, would sound a false alarm. It turns out that some complex tasks are  pretty difficult for humans to translate   into instructions that a rigidly logical  computer can follow exactly how we mean it to. But in the last decade, a different approach has  exploded in popularity for tackling these tasks.

It’s called a neural network,  and as the name implies,   it takes inspiration from  the neurons in our brains. Neural networks are a kind of algorithm that  connect up thousands or even millions of   mathematical components called neurons. Much like a logic gate, neurons take numbers  as input and, according to some rules,   produce numbers as output that can then  go on to be inputs for other neurons.

But unlike logic gates, a neural  network’s inputs and outputs can be   any number that the computer’s hardware  can represent, not just one or zero. So we can approximate almost any  complex mathematical function by   using a large enough neural network. That makes it   easier to create reliable solutions  to problems that require more nuance.

Another key difference between neural  networks and logic gates is that in   order to come up with the rules they  follow, neural networks are trained. Back to our forest fire example, a  neural network can be shown pictures   of forests that are or aren’t on fire,  and examples of the outputs we want,   like highlighting all the parts that  correspond to a real forest fire. By feeding these pairs of inputs  and outputs over and over,   the network learns what output we  actually want for a given input.

And along the way, it writes its own rules to  follow, as opposed to having humans program in   every single rule accounting for as many  scenarios as possible from the get-go. With all their training, neural networks  can sometimes create more robust and less   error prone algorithms than we can write  using heuristics,. And that’s especially   true for tasks that involve complex and  unstructured data, like human language.

Which is where ChatGPT finally enters the fray. It’s what’s called a large language model, or LLM. It was trained on huge bodies of  writing on the internet like Wikipedia, to take a piece of text as input and produce a piece of text in response as an output.

The general idea has been around for a few years, but ChatGPT is so eerily good at responding to certain requests, it can feel like a major step forward. That’s partially because ChatGPT’s  neural network is designed to pay better attention to both the context of a piece of data, and the most important bits of the input text. For example, it can craft long sentences that make way more sense than you’d get by continuously hitting the  auto-complete options on your phone.

But it’s also because ChatGPT was trained with a lot of human-assisted feedback, to specifically curate outputs that the trainers consider “high quality”. We’re glossing over lots of  detail, but needless to say ChatGPT has generated a lot of news for its ability to convincingly  emulate human responses to all kinds of funky problems. It’s blown open the door to  how we interact with computers.

Rather than painstakingly  coding sophisticated solutions to many problems, the interface of ChatGPT allows us to basically talk to a computer in natural language to make requests. Which includes asking it to do math! For example, you can type in: “Give me the sum of the first  20 numbers, divided by 4.” And not only does it give  the correct answer of 52.5, it demonstrates that it  correctly used a famous formula for adding numbers!

Which is like… what. WHAT. Since this is a computer correctly doing math, you might think some part of  ChatGPT’s enormous network resembles the logic gate structure of an adder, making it capable of doing  calculations the same way.

But before you throw your  calculator away for good, remember the showstopper at  the beginning of this video. Asking ChatGPT an even simpler question, just taking the sum of two large  numbers, sometimes gives a wrong answer. Admittedly, in our example, it only gets one digit in the  middle of the number incorrect.

But it’s kind of weird. You’d never get that error on a cheap, plastic, solar powered calculator, provided the numbers could fit on the screen. And it’s not an isolated glitch either!

Lots of people have noticed that ChatGPT often fails on reasonably straightforward math when larger numbers are involved. Since outside researchers don’t have direct access to the model… to the rules that the latest version of ChatGPT has taught itself… there’s no fully transparent study available. But based on what we know, some of it likely comes down  to the training process.

LLMs are trained to basically regurgitate a collage of words that closely resembles the patterns it’s encountered  in its training data. Some of that data not only includes  examples of adding numbers, but also encodes the broader structures of how people talk about numbers and  the functions we perform with them. So somewhere deep in ChatGPT, there’s probably some bits of the  network that resemble basic arithmetic.

After all, the numbers I used in my example don't show up in any internet searches, so it can’t just regurgitate  an answer it found online. And in the end, it did still manage to correctly add up most of the digits. In fact, a recent preprint by Chinese researchers found  that the latest model of ChatGPT could accurately add and subtract  numbers under one trillion about 99% of the time!

Unfortunately, that accuracy drops when  it comes to multiplication. The model only managed to get the right answer  about two thirds of the time! These failures imply that it doesn’t form perfect, logic-gate style math with unfaltering accuracy, like an ALU would do.

If it does have that little  bit of self-written code, it can’t find a way to use it  consistently every time it’s supposed to. It’s not great when you’re hoping to  get an answer that’s 100% correct, 100% of the time, but you know what ChatGPT’s  math skills remind me of? Me!

Humans are prone to using  unreliable reasoning too, especially for wordy math problems that cause us to flub even  simple arithmetic problems. And how many of us have  forgotten to carry a 1 or two? But with a bit of thought and guidance, we can improve our problem-solving skills.

And very weirdly… so might ChatGPT. There are cases of people coaxing ChatGPT into becoming more accurate with addition by explaining  its logic more carefully. Yep, that includes adding up those large numbers!

So it might take some prodding, but ChatGPT has the potential to be reliable. But ultimately, we can’t guarantee its answers will be accurate, like we can for the old-school ALUs. Its reasoning, performance, and abilities are still shaped more by human expression and the data we produce, rather than hard logical rules.

So for now, you probably want  to treat ChatGPT’s answers the same way you’d treat  those coming from a human: Understanding that whatever their intentions, they’re fallible, sometimes unreliable, and need verifying from  other sources of information, even if we think they’re right. But not everything we do requires  hard facts and calculations. ChatGPTs best quality may be its capacity to throw things out there and give us food for thought.

Like, “Give me some catchy title suggestions for my comedic novel about a team of people creating a science show on the internet, building a wholesome and nerdy community in the process." Okay, maybe now I know what not to title it… Taking ChatGPT’s suggestions as a creative starting point, or even combining it with more reliable code that can produce precise outputs, might be the best way forward. And this seems to be the direction  its designers are headed in. OpenAI, the organization behind ChatGPT, is testing out connecting their program with the platform Wolfram Alpha, which does contain hard coded, logical ways of processing math.

But for now, as impressive as ChatGPT is, if you’re looking to crunch numbers rather than draft emails… you might be better off with the calculator. Oh, and by the time you’re watching this video, ChatGPT may have learned to do this math problem correctly. Don’t worry.

With a little trial and error, you might be able to find a new problem it can’t solve yet. Thanks for watching this SciShow video, supported by Linode! Linode is a cloud computing company from Akamai that provides storage space, databases, analytics and more to you or your company.

And they do all of that really well. User reviews ranked Linode  above average in ease of use. And by “above average,” I mean easier than other big companies you might be familiar with.

Reviews also ranked Linode above average in ease of  setup and quality of support. But user reviews are just one metric. Linode literally wins awards for their customer support.

You can talk to a real person, which is shockingly rare these days, at any time of day and any time of year. After almost two decades of cloud computing, Linode has figured out how to get you the information you need. You can try out Linode by clicking the link in the description down below or going to linode.com/scishow for a $100 60-day credit on a new Linode account.

Thanks to Linode for  supporting this SciShow video! [ OUTRO ]