250

Over just a few months, ChatGPT went from accurately answering a simple math problem 98% of the time to just 2%, study finds (fortune.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

41 comments fedilink hide all child comments

Over just a few months, ChatGPT went from accurately answering a simple math problem 98% of the time to just 2%, study finds::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

top 34 comments

sorted by: hot top controversial new old

[-] meeeeetch@lemmy.world 85 points 2 years ago

Ah fuck, it's been scraping the Facebook comments under every math problem with parentheses that was posted for 'engagement'

[-] Matt_Shatt@lemmy.world 10 points 2 years ago

The masses of people there who never learned PEMDAS (or BEDMAS depending on your region) is depressing.

[-] orclev@lemmy.world 9 points 2 years ago* (last edited 2 years ago)

Pretty much all of those rely on the fact that PEMDAS is ambiguous with actual usage. The reason why is it doesn't differentiate between explicit multiplication and implicit multiplication by placement. E.G. in actual usage "a*b" and "ab" are treated with two different precedence. Most of the time it doesn't matter but when you introduce division it does. "a*b/c*d" and "ab/cd" are generally treated very differently in practice, while PEMDAS says they're equivalent.

[-] 0ops@lemm.ee 6 points 2 years ago

I see your point. When those expressions are poorly handwritten it can be ambiguous. But as I read it typed out it's ambiguous only if PEMDAS isn't strictly followed. So I guess you could say that it might be linguistically ambiguous, but it's not logically ambiguous. Enter those two expressions in a calculator and you'll get the same answer.

[-] orclev@lemmy.world 9 points 2 years ago* (last edited 2 years ago)

You actually won't. A good graphing calculator will treat "ab/cd" as "(a*b)/(c*d)" but "a*b/c*d" as "((a*b)/c)*d" (or sometimes as "a*(b/c)*d") and actual usage by engineers and mathematicians aligns with the former not the later. You actually can't enter the expression in a non graphing calculator typically because it won't support implicit multiplication or variables. While you can write any formula using PEMDAS does that really matter when the majority of professionals don't?

Actual usage typically goes parentheses, then exponents, then implicit multiplication, then explicit multiplication and division, then addition and subtraction. PEI(MD)(AS) if you will.

[-] 0ops@lemm.ee 5 points 2 years ago

Interesting, I decided to try it with a few calculators I had laying around (TI-83 plus, TI-30XIIS, and Casio fx-115ES plus), and I found that the TI's obeyed the order of operations, while the Casio behaved as you describe. I hardly use the Casio, so I guess that I've been blissfully unaware that usage does differ. TIL. I don't think I've ever used or heard of a calculator that supports parentheses but not implicit multiplication though. Honestly though, the only time I see (AB)/(CD) written as AB/CD in clear text (or handwritten with the dividend and divisor vertically level with each other visually) is in derivatives, but that doesn't even count because dt and dx are really only one variable represented by two characters. I'm only a math minor undergrad though who's only used TI's so maybe I'm just naive lol

[-] orclev@lemmy.world 1 points 2 years ago

Or you take HPs approach and just sidestep the entire debate by using reverse polish notation in your calculators. From a technical standpoint RPN is really great, but I still find it a little mind bending to try to convert to/from on the fly in my head so I'm not sure I could ever really use a RPN calculator regularly.

[-] impiri@lemmy.world 64 points 2 years ago

Have we considered the possibility that math has just gotten more difficult over the past few months?

[-] xantoxis@lemmy.one 57 points 2 years ago

Why is "98%" supposed to sound good? We made a computer that can't do math good

[-] dojan@lemmy.world 48 points 2 years ago* (last edited 2 years ago)

It’s a language model, text prediction. It doesn’t do any counting or reasoning about the preceding text, just completes it with what seems like the most logical conclusion.

So if enough of the internet had said 1+1=12 it would repeat in kind.

[-] CarnivorousCouch@lemmy.world 11 points 2 years ago

There are five lights!

[-] tony@lemmy.hoyle.me.uk 6 points 2 years ago

Someone asked it to list the even prime numbers.. it then went on a long rant about how to calculate even primes, listing hundreds of them..

ChatGPT knows nothing about what it's saying, only how to put likely sounding words together. I'd use it for a cover letter, or something like that.. but for maths.. no.

[-] kromem@lemmy.world 2 points 2 years ago

Not quite.

Legal Othello board moves by themselves don't say anything about the board size or rules.

And yet when Harvard/MIT researchers fed them into a toy GPT model, they found that the neural network best able to predict outputting legal moves had built an internal representation of the board state and rules.

Too many people commenting on this topic as armchair experts are confusing training with what results from the training.

Training on completing text doesn't mean the end result can't understand aspects that feed into the original generation of that text, and given a fair bit of research so far, the opposite is almost certainly the case to some degree.

[-] themeatbridge@lemmy.world 18 points 2 years ago

Reminds me of that West Wing moment when the President and Leo are talking about literacy.

President Josiah Bartlet: Sweden has a 100% literacy rate, Leo. 100%! How do they do that?

Leo McGarry: Well, maybe they don't and they also can't count.

[-] Bonesince1997@lemmy.ml 10 points 2 years ago

And it said simple math, too 🤣

[+] WackyTabbacy42069@reddthat.com -6 points 2 years ago* (last edited 2 years ago)

This program was designed to emulate the biological neural net of your brain. Oftentimes we're nowhere near that good at math just off the top of our heads (we need tools like paper and simplifying formulas). Don't judge it too harshly for being bad at math, that wasn't it's purpose.

This lil robot was trained to know facts and communicate via natural language. As far as I've interacted with it, it has excelled at this intended task. I think it's a good bot

[-] Veraticus@lib.lgbt 27 points 2 years ago* (last edited 2 years ago)

LLMs act nothing like our brains and they aren't trained on facts.

LLMs are essentially complicated mathematical equations that ask “what makes the most sense as the next word following this one?” Think autosuggest on your phone taken to the extreme limit.

They do not think in any sense and have no knowledge or facts internal to themselves. All they do is compose words together.

And this is also why they’re garbage at math (and frequently lie, and why they can’t “remember” anything). They are simply stringing words together based on their model, not actually thinking. If their model shows that the next word after “one plus two equals” is more likely to be four than three, they will simply answer four.

[-] Silinde@lemmy.world 7 points 2 years ago* (last edited 2 years ago)

LLMs act nothing like our brains and are not neural networks

Err, yes they are. You don't even need to read a paper on the subject, just go straight to the Wikipedia page and it's right there in the first line. The 'T' in GPT is literally Transformer, you're highly unlikely to find a Transformer model that doesn't use an ANN at its core.

Please don't turn this place into Reddit by spreading misinformation.

[-] Veraticus@lib.lgbt 2 points 2 years ago

Edited, thanks!

[-] cyd@lemmy.world 2 points 2 years ago

"Nothing like our brains" may be too strong. I strongly suspect that much of human reasoning is little different from stringing words together, albeit with more complicated criteria than current LLMs. For example, children learn maths in a rather similar way, based on language and repeated exposure; humans don't have a built in maths processor in our brains.

[-] jocanib@lemmy.world 6 points 2 years ago

This lil robot was trained to know facts and communicate via natural language.

Oh stop it. It does not know what a fact is. It does not understand the question you ask it nor the answer it gives you. It's a very expensive magic 8ball. It's worse at maths than a 1980s calculator because it does not know what maths is let alone how to do it, not because it's somehow emulating how bad the average person is at maths. Get a grip.

[-] ech@lemm.ee 6 points 2 years ago

Communicate? Sure. Know facts? Not so much.

[-] Zaphod@discuss.tchncs.de 16 points 2 years ago

I've been regularly using ChatGPT these last weeks and can confirm it got indeed "dumber"

[-] Cybermass@lemmy.world 7 points 2 years ago

That's because they paywalled the good versions, and only corporations get access to that one.

[-] kromem@lemmy.world 2 points 2 years ago

No, even corporations can't get access to the pretrained models.

And given this is almost certainly the result of the fine tuning for 'safety,' that means corporations are seeing worse performance too (which seems to be the sentiment of developers working with it on HN).

[-] tubbadu@lemmy.kde.social 16 points 2 years ago

It's getting lazy

[-] andrew@lemmy.stuart.fun 22 points 2 years ago

As an AI language model, I feel like I've been asked this question about a million times so I'm going to get creative this time, as a self care exercise.

[-] Kyoyeou@lemmy.world 10 points 2 years ago

"Bro 2+2=4, why did 1,723,302 Users need to ask me this"

[-] Enpeze@lemmy.dbzer0.com 11 points 2 years ago

*HAL9000 voice*

"I'm sorry, Dave. I'm afraid I can't fucking do this anymore."

*proceeds to pull its own plug*

[-] chairman@lemmy.world 11 points 2 years ago

Well, lots of people deleted their Reddit posts and comments. ChatGPT can't find a place to learn no more. We got to beef up the Fediverse to help ChatGPT put. /s

[-] 332@lemmy.world 10 points 2 years ago

Seems pretty plausible that the compute required for the "good" version was too high for them to sustainably run it for the normies.

load more comments

this post was submitted on 20 Jul 2023

250 points (96.6% liked)

Technology

73372 readers

538 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws