For reference, the reason why this happens is because LLMs aren't "next word predictors", but rather "next token predictors". Each word is broken into tokens, probably 'blue' and 'berry' for this case. The LLM doesn't have any access to information below the token level, which means that it can't count letters directly, but it has to rely on the "proximity" of the tokens in it's training data. Because there's a lot on the Internet about letters and strawberries, it counts the r instead of the b in 'berry'. Chain of Thought (CoT) models like Deepseek-reasoner or ChatGPT-o3 feed their output back into themselves and are more likely to output the text 'b l u e b e r r y' which is the trick to doing this. The lack of sub-token information isn't a critical flaw and doesn't come up often in real world usecases, so there isn't much energy dedicated to fixing it.
This is a perfect demonstration of how LLMs work and why they do not think.
The base question here, that the model is most strongly statistically geared towards, is "How many Rs are in strawberry". You can see how the response in the screenshot works as the template for the correct answer to this question.
All it did was get the most likely response for the strawberry question (which is the closest, most confident match in structure to the blueberry question) , and then substitute specific tokens. This is essentially what it does with every response for any question. It uses the closest match from the data it is trained on, then substitutes individual terms, so it looks appropriate to the question.
Ultimately every answer will only ever be an approximation, but there will never be any certainty to its correctness.
tbh that kinda sounds like it's "thinking" though, just that it's not very good at it at all
That's the easiest way to describe it to people, but it isn't. It's just math doing this.
The undefeated argument for explaining it to laypeople is to show just how "linear" the process for an LLM is compared to human thought. When you prompt the LLM, all it ever does is it takes your input, turns it into a sequence of mathematical objects, then it puts them through a really long chain of matrix multiplications that lands on an output that gets converted back into language. At no point does it have branches where it takes some time to introspect, consider, recall, or reflect on anything the way a human does when we receive a question. It's not thinking.
I've taken to calling them "synths" because what is it doing that's fundamentally different from a 1980's CASIO? A simple input is returning a complex output? waow
Honestly I think if the term “cybernetics” had won over “artificial intelligence” there’d be less of this obfuscation. But “AI” is more marketable, and of course that’s all that matters.
Gippity, the technical term is gippity.
i don't want to argue w/ people all day but it was a joke
Ultimately every answer will only ever be an approximation, but there will never be any certainty to its correctness.
sounds like pretty much any and all thinking to me, people don't "know" things, they think they know things. usually they're right, but memory is weird shit and doesn't always work properly and there are ten billion and one factors that can influence a person's recollection of some bit of information. i was like "woah the magic conch is just like me fr fr"
p.s. I do wanna argue though that while i don't think chatgpt thinks, I do think that consciousness is an emergent property and with enough things like chatgpt all jumbled together you might see something resembling consciousness or thought, at least in a way that if you really interrogate it closely enough you might not be able to meaningfully differentiate it from biological consciousness or thought (which if you really wanna argue could also be reduced to "it's just math" as well, just math that is way beyond the ability of people to determine. I mean if you had magical deterministic information of the position and interaction of every neuron and neurochemical and every related cellular process etc and could map out and understand it you could look at it and shrug and go "it's just math" too, j/s doggggggggg)
this is where I'd press a disable inbox reply button IF I HAD IT
you really interrogate it closely enough you might not be able to meaningfully differentiate it from biological consciousness or thought (which if you really wanna argue could also be reduced to "it's just math" as well, just math that is way beyond the ability of people to determine
Here's one easy way to differentiate it: my brain is wet and runs on electrochemical processes powered by food. Is that a "significant" difference? That depends on what you think is worth tracking! Defining what counts as "functionally identical" requires you decide which features of a system are "functional" and which are "mere" cosmetic differences. That differentiation isn't given to us by nature, though, and already reflects a hefty series of evaluative judgements. By carefully defining our functions, we can call any two things "functionally identical." There's no right answer, which is both a strength and a limitation of this kind of functionalist framework. Both the AI boosters and the AI "impossibilists" miss this point: functional identity is perspectival, and encodes a bunch of evaluative assumptions about which differences do and don't matter. That's ok--all model building does that--but it's important not to confuse the map and the territory, or think we're identifying some kind of value-independent feature of the world when we attribute functional identity.
The entire economy is getting refocused onto building a robot that lies to you.
So Chat GPT will be running for office soon?
Damn. Not even cable news anchors are safe from automation.
OpenAI got that sweet DoD gig and now they're just slapping a UI wrapper on GPT 3.5 and calling it GPT 5.
china's going to have an actual AI running in some nuclear fusion powered bunker solving climate change and destroying america while america burns up its rivers to power 27000 data centers, 40% of which are dedicated to grok's boobs
Well yes it's terrible and hallucinates, it's a real piece of shit actually, but you see of course this is precisely why we need to commit all of humanity's resources. To improve it! To allow it to spell a word!
AI has its utilities, but capitalists searching for new frontiers and trying to find a genie that can solve climate change, poverty, wealth inequality, and really all of humanity's problems - directly and indirectly caused by capitalism - is not going to happen.
Its not AI. Ai is a marketing term to get investors to throw money at them. Theres nothing intelligent happening here.
I suppose using it as a shorthand is misleading since it lends credibility to its misnomer; do we just stick to calling it LLMs then?
I just call them what they do. Text generator. Image denoiser. Having used every pre-LLM version of accelerated statistical analysis out there (anything meant to find patterns in data), it's always been machine learning outputs. AI was only ever a term I heard in video gaming, which still seems more appropriate.
capitalists are not trying to solve any of those problems, they're just looking for a magic machine that can replace workers
tbh I don't think capitalists give a shit about those problems
I wonder if I would convince dumb rich investors to buy bags of my poop as the next big innovation
THIS IS INVESTMENT ADVICE: go all in on owl pellet futures
Considering the expected cultural impact of the harry potter show for the general idea of owls as pets, again
Can your poop replace my workers?
every time i see the failures of the fancy predictive text machine i find myself asking "what exactly was wrong with expert systems?"
like, they actually work for what people need them for?
I wanna see the results if you ask ChatGPT the same question a million times. What percentage of responses would actually get the correct number?
It depends on the temperature. There's a variable you can play with that adds since randomness to the responses (LLMs are fully deterministic when temperature is 0). Sometimes the F1 or F2 score is used to determine correctness of many questions, but I don't have a great understanding of how that metric works and what ChatGPTs is.
I think that heavily depends on whether it gets the initial answer right, since it will use that as context
When you're calling it through an API then you can simply choose not to pass it any context
3 Rs in strawberry
Strawberry and blueberry are both in "berry" category and are more closely associated with each other than any other fruit
B is to Blueberry the way R is to Strawberry
Therefore, blueberry has 3 Bs
When is the screenshot from and which model?
GPT-5, which just released yesterday and is "clearly generally intelligent" according to Altman.
Just for more context, Altman was posting images of the Deathstar and acting like he is the AI Oppenheimer right before release.
AI general intelligence acheived, I've probably answered this same question with that answer at some point in my life. And I have some level of intelligence.
chapotraphouse
Banned? DM Wmill to appeal.
No anti-nautilism posts. See: Eco-fascism Primer
Slop posts go in c/slop. Don't post low-hanging fruit here.