this is because an LLM is not made for playing chess
You say you produce good oranges but my machine for testing apples gave your oranges a very low score.
No, more like "Your marketing team, sales team, the news media at large, and random hype men all insist your orange machine works amazing on any fruit if you know how to use it right. It didn't work my strawberries when I gave it all the help I could, and was outperformed by my 40 year old strawberry machine. Please stop selling the idea it works on all fruit."
This study is specifically a counter to the constant hype that these LLMs will revolutionize absolutely everything, and the constant word choices used in discussion of LLMs that imply they have reasoning capabilities.
An LLM is a poor computational/predictive paradigm for playing chess.
The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
Yes, I agree wholeheartedly with your clarification.
My career path, as I stated in a different comment in regards to neural networks, is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.
Thus, large language models are well out of my area of expertise in terms of the architecture of their models.
However, fundamentally it boils down to the fact that the specific large language model used was designed to predict text and not necessarily solve problems/play games to "win"/"survive".
(I admit that I'm just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to practice explaining to laymen and, dare I say, clients. It helps me feel as if I don't come off too pompously when talking about this subject to others; forgive my tedium.)
Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.
2025 Mazda MX-5 Miata 'got absolutely wrecked' by Inflatable Boat in beginner's boat racing match — Mazda's newest model bamboozled by 1930s technology.
I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."
It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
The press release where OpenAI said we'd never need chess players again
It's also from a company claiming they're getting closer to create morphing shape that can match any hole.
ChatGPT has been, hands down, the worst AI coding assistant I've ever used.
It regularly suggests code that doesn't compile or isn't even for the language.
It generally suggests AC of code that is just a copy of the lines I just wrote.
Sometimes it likes to suggest setting the same property like 5 times.
It is absolute garbage and I do not recommend it to anyone.
my favorite thing is to constantly be implementing libraries that don't exist
Oh man, I feel this. A couple of times I've had to field questions about some REST API I support and they ask why they get errors when they supply a specific attribute. Now that attribute never existed, not in our code, not in our documentation, we never thought of it. So I say "Well, that attribute is invalid, I'm not sure where you saw to do that". They get insistent that the code is generated by a very good LLM, so we must be missing something...
Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.
Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.
That's exactly what thinking is, though.
An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It's like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness ("temperature") when choosing the next token so the responses are not identical each time.
But it is not thinking. Not even remotely so. It's a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.
ollama run gemma3:4b
>>> /set parameter temperature 0
>>> what is a leaf
You will get the same answer every single time.
I know what an LLM is doing. You don't know what your brain is doing.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.