209

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds (hackaday.com)

submitted 4 months ago by muelltonne@feddit.org to c/technology@lemmy.world

97 comments fedilink hide all child comments

(page 2) 50 comments

sorted by: hot top controversial new old

[-] mudkip@lemdro.id 2 points 4 months ago

Great, why aren't we doing it?

[-] Telorand@reddthat.com 1 points 4 months ago

Because it's hard(er than doing nothing) and takes changing habits.

[-] morto@piefed.social 1 points 4 months ago

I used to think it wasn't viable to poison llms, but are you saying there's a chance? [a meme comes to mind]

[-] No1@aussie.zone 1 points 4 months ago

You and me. We just need 248 more volunteers and we can save the world!

[-] NuXCOM_90Percent@lemmy.zip 1 points 4 months ago

found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM

That is a very key point.

if you know what you are doing? Yes, you can destroy a model. In large part because so many people are using unlabeled training data.

As a bit of context/baby's first model training:

Training on unlabeled data is effectively searching the data for patterns and, optimally, identifying what those patterns are. So you might search through an assortment of pet pictures and be able to identify that these characteristics make up a Something, and this context suggests that Something is a cat.
Labeling data is where you go in ahead of time to actually say "Picture 7125166 is a cat". This is what used to be done with (this feels like it should be a racist term but might not be?) Mechanical Turks or even modern day captcha checks.

Just the former is very susceptible to this kind of attack because... you are effectively labeling the training data without the trainers knowing. And it can be very rapidly defeated, once people know about it, by... just labeling that specific topic. So if your Is Hotdog? app is flagging a bunch of dicks? You can go in and flag maybe 10 dicks and 10 hot dogs and ten bratwurst and you'll be good to go.

All of which gets back to: The "good" LLMs? Those are the ones companies are paying for to use for very specific use cases and training data is very heavily labeled as part of that.

For the cheap "build up word of mouth" LLMs? They don't give a fuck and they are invariably going to be poisoned by misinformation. Just like humanity is. Hey, what can't jet fuel melt again?

[-] Telorand@reddthat.com 1 points 4 months ago

On that note, if you're an artist, make sure you take Nightshade or Glaze for a spin. Don't need access to the LLM if they're wantonly snarfing up poison.

[-] _cryptagion@anarchist.nexus 1 points 4 months ago

the reason more people haven't adopted that is because they don't work.

[-] Telorand@reddthat.com 0 points 4 months ago

I haven't seen any objective evidence that they don't work. I've seen anecdotal stories, but nothing in the way of actual proof.

[-] Buffalox@lemmy.world 1 points 4 months ago

You can't prove a negative, what you should look for is evidence that it works, without such evidence, there is no reason to believe it does.

[-] Telorand@reddthat.com -1 points 4 months ago* (last edited 4 months ago)

Okay. I have that. Now what?

ETA: also, you can prove a negative, it's just often much harder. Since the person above said it doesn't work, the positive claim is theirs to justify. Whether it's hard or not is not my problem.

[-] Buffalox@lemmy.world 1 points 4 months ago

Okay. I have that. Now what?

Then you have your evidence, and your previous post is nonsensical.

[-] _cryptagion@anarchist.nexus 0 points 4 months ago

Last time I checked out Glaze, around the time it was announced, they refused to release any of their test data, and wouldn’t let people test images they had glazed. Idk why people wouldn’t find it super sus behavior, but either way it’s made moot by the fact that social media compresses images and ruins the glazing anyway, so it’s not really something people creating models worry about. When an artist shares their work, they’re nice enough to deglaze it for us.

[-] _cryptagion@anarchist.nexus -1 points 4 months ago

Well I haven’t seen any objective evidence that god doesn’t exist, but that don’t mean I believe in her.

[-] Telorand@reddthat.com 1 points 4 months ago

Okay. Same. I'm not asking you to believe Glaze/Nightshade works on my word alone. All I said was that artists should try it.

[-] yardratianSoma@lemmy.ca 1 points 4 months ago* (last edited 4 months ago)

Well, I'm still glad offline LLM's exist. The models we download and store are way less popular then the mainstream, perpetually online ones.

Once I beef up my hardware (which will take a while seeing how crazy RAM prices are), I will basically forgo the need to ever use an online LLM ever again, because even now on my old hardware, I can handle 7 to 16B parameter models (quantized, of course).

[-] Sam_Bass@lemmy.world 1 points 4 months ago

Thats a price you pay for all the indiscriminate scraping

[-] Vupware@lemmy.zip 0 points 4 months ago

The only way I could do that was if you had to do a little more work and I would be happy with it but you have a hard day and you don’t want me working on your day so you don’t want me doing that so you can get it all over with your own thing I would be fine if I was just trying not being rude to your friend or something but you don’t want me being mean and rude and rude and you just want me being mean I would just like you know that and you know I would like you and you know what I’m talking to do I would love you to do and you would love you too and you would like you know what to say and you would like you to me

[-] DarkSideOfTheMoon@lemmy.world 0 points 4 months ago

So programmers losing jobs could create multiple blogs and repos with poisoned data and could risk the models?

[-] _cryptagion@anarchist.nexus -1 points 4 months ago

if that's true, why hasn't it worked so far then?

load more comments

this post was submitted on 15 Dec 2025

209 points (99.1% liked)

Technology

84101 readers

225 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws