1147

submitted 3 weeks ago by fubarx@lemmy.world to c/programmer_humor@programming.dev

81 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] AwesomeLowlander@sh.itjust.works 14 points 3 weeks ago* (last edited 3 weeks ago)

The model 'blackmailed' the person because they provided it with a prompt asking it to pretend to blackmail them. Gee, I wonder what they expected.

Have not heard the one about cancelling active alerts, but I doubt it's any less bullshit. Got a source about it?

Edit: Here's a deep dive into why those claims are BS: https://www.aipanic.news/p/ai-blackmail-fact-checking-a-misleading

[-] yannic@lemmy.ca 3 points 3 weeks ago

I provided enough information that the relevant source shows up in a search, but here you go:

In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., "Agentic Misalignment: How LLMs Could be an Insider Threat", Anthropic Research, 2025]

[-] AwesomeLowlander@sh.itjust.works 10 points 3 weeks ago

Yes, I also already edited my comment with a link going into the incidents and why they're absolute nonsense.

[-] yannic@lemmy.ca 2 points 2 weeks ago

Thank you. Much appreciated. I see your point.

this post was submitted on 29 Dec 2025

1147 points (99.1% liked)

Programmer Humor

28593 readers

1061 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

Keep content in english
No advertisements
Posts must be related to programming or programmer topics

founded 2 years ago

MODERATORS

Feyter@programming.dev

anzo@programming.dev

BurningTurtle@programming.dev

pylapp@programming.dev