New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes (thehackernews.com)

submitted 2 days ago by kid@sh.itjust.works to c/cybersecurity@sh.itjust.works

6 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] swizzlestick@lemmy.zip 8 points 2 days ago

Anyone allowing an LLM to take direct, tangible change on anything deserves everything they get for being so utterly stupid. This came awfully close.

Parsing user queries and regurgitating publicly available answers (that the user could probably search for themselves) is about the limit of trust, and even then it's sketchy. They're such soft targets and get juicier the more pies they are allowed to have their fingers in.

[-] Rentlar@lemmy.ca 4 points 2 days ago

The case I know of a company wanting to get the "efficiency" of using chatbots instead of people but not the responsibility of one, is Air Canada. They were held responsible in that case of their AI agent's policy hallucinations. Though the customer had to go through many hoops to get to that point and probably others were affected without due recourse.

[-] swizzlestick@lemmy.zip 2 points 2 days ago

The British Columbia Civil Resolution Tribunal rejected that argument, ruling that Air Canada had to pay Moffatt $812.02 (£642.64) in damages and tribunal fees. "It should be obvious to Air Canada that it is responsible for all the information on its website," read tribunal member Christopher Rivers' written response.

What a brass neck on them - shocking they couldn't see it and decide to settle quietly instead.

Best thing I've read all day, cheers :)

this post was submitted on 12 Jun 2025

22 points (95.8% liked)

Cybersecurity

7528 readers

67 users here now

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

Be respectful. Everyone should feel welcome here.
No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
No Ads / Spamming.
No pornography.

Community Rules

Idk, keep it semi-professional?
Nothing illegal. We're all ethical here.
Rules will be added/redefined as necessary.

If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities !databreaches@lemmy.zip !netsec@lemmy.world !securitynews@infosec.pub !cybersecurity@infosec.pub !pulse_of_truth@infosec.pub

Notable mention to !cybersecuritymemes@lemmy.world

founded 2 years ago

MODERATORS

kid@sh.itjust.works

Lanky_Pomegranate530@midwest.social