151
Today's Large Language Models are Essentially BS Machines
(quandyfactory.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
I'm not even sure what this is supposed to be saying. Sounds kind of like a bullshit generator.
Words are encodings of knowledge and their expression and use represent that knowledge, and these machines ingest a repository containing a significant percent of written human communication. It encodes that the words "dog" and "bark" are often used together, but it also encodes that "dog" and "cat" are things that are both "mammals" and "mammals" are "animals", and that the pair of them are much more likely to appear in a human household than a "porpoise". What is this other kind of model of objects that hasn't been in some way represented in all of the internet?
It is not a model of objects. It's a model of words. It doesn't know what those words themselves mean or what they refer to; it doesn't know how they relate together, except that some words are more likely to follow other words. (It doesn't even know what an object is!)
When we say "cat," we think of a cat. If we then talk about a cat, it's because we love cats, or hate them, or want to communicate something about them.
When an LLM says "cat," it has done so because a tokenization process selected it from a chain of word weights.
That's the difference. It doesn't think or reason or feel at all, and that does actually matter.
This is just the same hand-waving repeated. What does it mean to "know what a word means"? How is a word, indexed into a complex network of word embeddings, meaningfully different as a token from this desired "object model"? Because the indexing and encoding very much does relate words together separately from their likelihood to appear in a sentence together. These embeddings may be learned from language, but language is simply a method of communicating meaning, and notably humans also learn meaning through consuming it.
What do things like "love" or "want" or "feeling" have to do with a model of objects? How would you even recognize a system that does that and why would it be any more capable than a LLM at producing good and trustable information? Does feeling love for a concept help you explain what a random blogger does? Do you need to want something to produce meaningful output?
This just all seems like poorly defined techno-spiritualism.
For one, ChatGPT has no idea what a cat or dog looks like. It has no understanding of their differences in character of movement. Lacking that kind of non-verbal understanding, when analysing art that's actually in its domain, that is, poetry, it couldn't even begin to make sense of the question "has this poem feline or canine qualities" -- best it can do is recognise that there's neither cats nor dogs in it and, being stumped, make up some utter nonsense. Maybe it has heard of catty and that dogs are loyal and will be looking for those themes, but feline and canine as in elegance? Forget it, unless it has read a large corpus of poet analysis that uses those terms: It can parrot that pattern matching, but it can't do the pattern matching itself, it cannot transfer knowledge from one domain to another when it has no access to one of those domains.
And that's the tip of the iceberg. As humans we're not really capable of purely symbolic thought so it's practically impossible to appreciate just how limited those systems are because they're not embodied.
(And, yes, Stable Diffusion has some understanding of feline vs. canine as in elegance -- but it's an utter moron in other areas. It can't even count to one).
Then, that all said, and even more fundamentally, ChatGPT (as all other current AI algos we have) is a T2 system, not a T3 system. It comes with rules how to learn, it doesn't come with rules enabling it to learn how to learn. As such it never thinks -- it cannot think, as in "mull over". It reacts with what passes as a gut in AI land, and never with "oh I'm not sure about this so let me mull it over". It is in principle capable of not being sure but that doesn't mean it can rectify the situation.
Which is obviously false, as a quick try will show. Poems are just language and LLMs understand that very well. That LLMs don't have any idea how cats actually look like or move, beyond what they can gather from text books, is irrelevant here, they aren't tasked with painting a picture (which the upcoming multi-modal models can do anyway).
Now there can of course be problems that can be expressed in language, but not solve in the realm of language. But I find those to be incredible rare, rare enough that I never really seen a good example. ChatGPT captures an enormous amount of knowledge about the world, and humans have written about a lot of stuff. Coming up with questions that would be trivial to answer for any human, but impossible for ChatGPT is quite tricky.
Have you actually ever actually seen an iceberg or just read about them?
ChatGPT doesn't learn. It's a completely static model that doesn't change. All the learning happened in a separate step back when it was created, it doesn't happen when you interact with it. That illusion comes from the text prompt, which includes both your text as well as its output, getting feed into the model as input. But outside that text prompt, it's just static.
That's because it fundamentally can't mull it over. It's a feed forward neural network, meaning everything that goes in on one side comes out on the other in a fixed amount of time. It can't do loops by itself. It has no hidden internal monologue. The only dynamic part is the prompt, which is also why its ability to problem solve improves quite a bit when you require it to do the steps individually instead of just presenting the answer, as that allows the prompt to be it's "internal monologue".
Which is why I came up with the "feline poetry" example. It's a quite simple concept for a human even if not particularly poetry-inclined, yet, if noone ever has written about the concept it's going to be an uphill battle for ChatGPT. And, no, I haven't tried. I also didn't mean it as some kind of dick measuring contest I simply wanted to explain what kind of thing ChatGPT really has trouble with.
As a matter of fact yes, I have. North cape, Norway.
ChatGPT is also its training procedure if you ask me, same as humanity is also its evolution.
It is not hand-waving; it is the difference between an LLM, which, again, has no cognizance, no agency, and no thought -- and humans, which do. Do you truly believe humans are simply mechanistic processes that when you ask them a question, a cascade of mathematics occurs and they spit out an output? People actually have an internal reality. For example, they could refuse to answer your question! Can an LLM do even something that simple?
I find it absolutely mystifying you claim you've studied this when you so confidently analogize humans and LLMs when they truly are nothing alike.
Those two things can be true at the same time.
"Nothing alike" is kinda harsh, we do have about as much in common with ChatGPT as we have with flies purpose-bred to fly left or right when exposed to certain stimuli.
No, they can't. The question is fundamentally: do humans have any internal thoughts or feelings, or are they algorithms? If you believe other people aren't literally NPCs, then they are not LLMs.
That doesn't even begin to be a dichotomy. Unless you want to claim humans are more than Turing complete (hint: that's not just physically but logically impossible) we can be expressed as algorithms. Including that fancy-pants feature of having an internal world, and moreso being aware of having that world (a thermostat also has an internal world but it's a) rather limited and b) the thermostat doesn't have a system to regulate its internal world, the outside world does that for it).
Wow, do you have any proof of this wild assertion? Has this ever been done before or is this simply conjecture?
No. A thermostat is an unthinking device. It has no thoughts or feelings and no "self." In this regard it is the same as LLMs, which also have no thoughts, feelings, or "self."
A thermostat executes actions when a human acts upon it. But it has no agency and does not think in any sense; it does simply what it was designed to do. LLMs are to language as thermostats are to controlling HVAC systems, and nothing more than that.
There is as much chance of your thermostat gaining sentience if we give it more computing power as an LLM.
A Turing machine can compute any computable function. For a thing to exist in the real world it has to be computable otherwise you break cause and effect itself as the Church-Turing Thesis doesn't really rely on anything but there being implication.
So, no, not proof. More an assertion of the type "Assuming the Universe is not dreamt up by a Holtzmann brain and causality continues to apply, ...".
That's a fair assessment but besides the point: A thermostat has an internal state it can affect (the valve), is under its control and not that of silly humans (that is, not directly) aka an internal world.
Also correct. But that's because it's a T1 system, not because the human mind can't be expressed as an algorithm. Rocks are T0 system and I think you'll agree dumber than thermostats, most of what runs on our computers is a T1 system, ChatGPT and everything AI we have is T2, the human mind is T3: Our genes don't merely come with instructions how to learn (that's ChatGPT's training algorithm), but with instructions on learning how to learn. We're as much more sophisticated than ChatGPT, for an appropriate notion of "sophisticated", as thermostats are more sophisticated than rocks.
I apologize if I was unclear when I spoke of an internal world. I meant interior thoughts and feelings. I think most people would agree sentience is predicated on the idea that the sentient object has some combination of its own emotions, motivations, desires, and ability to experience the world.
LLMs have as much of that as a thermostat does; that is, zero. It is a word completion algorithm and nothing more.
Your paper doesn't bother to define what these T-systems are so I can't speak to your categorization. But I think rating the mental abilities of thermostats versus computers versus ChatGPT versus human minds totally absurd. They aren't on the same scale, they're different kinds of things. Human minds have actual sentience. Everything else in that list is a device, created by humans, to do a specific task and nothing more. None of them are anything more than that.
Have a look here. Key concept is the adaptive traverse, Tn-system then means "a system with that many traverses". What I meant with my comparison there is simply that a rock has a traverse less than a thermostat, and ChatGPT has a traverse less than us.
Addition, multiplication and exponentiation all are on the same scale, yet they're different things. Regarding number of traverses it's absolutely fair to say that it's a scale of quality, not quantity.
Sentience as in the processing of the environment while processing your processing of that environment? Yep that sounds like a T3 system. Going out a bit on a limb, during deep sleep we regress to T2, while dreams are a funky "let's pretend our conditioning/memory is the environment" state. Arachnids apparently can do it, and definitely all mammals. Insects seem to be T2 from the POV of my non-biologist ass.
You are a device created by evolution to figure out whether your genes are adaptive enough to its surroundings to reproduce
I’m giving up here but evolution did not “design” us. LLMs are designs and created with a purpose in mind and they fulfill that purpose. Humans were not designed.
In cybernetics that's irrelevant as the purpose of a system is what it does. I can design an algorithm that plays pong, I can write a program to evolve one, they might actually end up being identical and noone could tell.
It's entirely not irrelevant. Even if you create a program to evolve pong, that was also designed by a human. As a computer programmer you should know that no computer program will just become pong, what an idiotic idea.
You just keep pivoting away from how you were using words to them meaning something entirely different; this entire argument is worthless. At least LLMs don't change the definitions of the words they use as they use them.
Playing pong. Inputs: ball (and possibly enemy) position, output: paddle left or right. Something like NEAT will very quickly come up with the obvious "track the ball" approach using just as many AST nodes as you would.
Define your terms. And explain why any of them matter for producing valid and "intelligent" responses to questions.
Why are you so confident they aren't? Do you believe in a soul or some other ephemeral entity that wouldn't leave us as a biological machine?
Define your terms. And again, why is that a requirement for intelligence? Most of the things we do each day don't involve conscious internal planning and reasoning. We simply act and if asked will generate justifications and reasoning after the fact.
It's not that I'm claiming LLMs = humans, I'm saying you're throwing out all these fuzzy concepts as if they're essential features lacking in LLMs to explain their failures in some question answering as something other than just a data problem. Many people want to believe in human intellectual specialness, and more recently people are scared of losing their jobs to AI, so there's always a kneejerk reaction to redefine intelligence whenever an animal or machine is discovered to have surpassed the previous threshold. Your thresholds are facets of the mind that you both don't define, have no means to recognize (I assume your consciousness, but I cannot test it), and have not explained why they're important for fact rather than BS generation.
How the brain works and what's important for various capabilities is not a well understood subject, and many of these seemingly essential features are not really testable or comparable between people and sometimes just don't exist in people, either due to brain damage or a simple quirk in their development. The people with these conditions (and a host of other psychological anomalies) seem to function just fine and would not be considered unthinking. They can certainly answer (and get wrong) questions.
So do LLMs.
Ask it about any NSFW topic and it will refuse.
They seem way more similar than different. The part were they are different trivially follow from the LLMs architecture (e.g. LLMs are static, tokenizing makes character-based problems difficult, memory is limited to the prompt, no interaction with the external world, no vision, no hearing, ...) and most of that can be overcome by extending the model, e.g. multi-model models with vision and hearing are on their way, DeepMind is working on models that interact with the real world, etc. This is all coming and coming fast.