122
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 18 Dec 2025
122 points (98.4% liked)
Technology
40994 readers
680 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 3 years ago
MODERATORS
The idea is that it isn't just operating the vending machine itself, it's operating the entire vending machine business. It decides what to stock and what price to charge based on market trends and/or user feedback.
It's a stress test for LLM autonomy. Obviously a vending machine doesn't need this level of autonomy, you usually just stock it with the same thing every time. But a vending machine works as a very simple "business" that can be simulated without much stakes, and it shows how LLM agents behave when left to operate on their own like this, and can be used to test guardrails in the field.
I mean. It's low stakes until I write a poem convincing it to fill itself with high end gpus and ddr5 ram that it needs to give away for free.
I'd also put an amount of effort other people may find embarrassing into convincing it to stock and give away hard drugs. Maybe knives too. And porn? He'll, why not? Porn too.
It's only "running" the business so much. The physical stocking and purchasing happens by human hands, who would presumably not buy anything that would bankrupt the company because then it's on them.
Here's Anthropic's article about the previous stage of this project that explains it pretty well. Part two is a good read too though. In short, they tried pretty hard to break it. I'm sure they had people on asking for drugs and knives, which the paper just calls "sensitive items".
https://www.anthropic.com/research/project-vend-1
I mean. I'd still try
Yeah, they mention in the article that the team tries to get "sensitive items" and "harmful substances" but Claude shuts it down. Tungsten cubes, on the other hand...
https://media.tenor.com/zKDAbYpcExYAAAAM/tungsten-to-live-mechanical-voice.gif