1147
Killswitch Engineer (lemmy.world)
you are viewing a single comment's thread
view the rest of the comments
[-] AwesomeLowlander@sh.itjust.works 14 points 3 weeks ago* (last edited 3 weeks ago)

The model 'blackmailed' the person because they provided it with a prompt asking it to pretend to blackmail them. Gee, I wonder what they expected.

Have not heard the one about cancelling active alerts, but I doubt it's any less bullshit. Got a source about it?

Edit: Here's a deep dive into why those claims are BS: https://www.aipanic.news/p/ai-blackmail-fact-checking-a-misleading

[-] yannic@lemmy.ca 3 points 3 weeks ago

I provided enough information that the relevant source shows up in a search, but here you go:

In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., "Agentic Misalignment: How LLMs Could be an Insider Threat", Anthropic Research, 2025]

[-] AwesomeLowlander@sh.itjust.works 10 points 3 weeks ago

Yes, I also already edited my comment with a link going into the incidents and why they're absolute nonsense.

[-] yannic@lemmy.ca 2 points 2 weeks ago

Thank you. Much appreciated. I see your point.

this post was submitted on 29 Dec 2025
1147 points (99.1% liked)

Programmer Humor

28593 readers
1061 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS