569

This new data poisoning tool lets artists fight back against generative AI (www.technologyreview.com)

submitted 2 years ago by ElectroVagrant@lemmy.world to c/technology@lemmy.world

137 comments fedilink hide all child comments

A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.

The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission.
[...]
Zhao’s team also developed Glaze, a tool that allows artists to “mask” their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows.

you are viewing a single comment's thread
view the rest of the comments

[-] vidarh@lemmy.stad.social 5 points 2 years ago

It doesn't need to "develop its own style". That's the point. The more examples of these adversarial images are in the training set, the better it will learn to disregard the adversarial modifications, and still learn the same style. As much as you might want to stop it from learning a given style, as long as the style can be seen, it can be copied - both by humans and AI's.

[-] RubberElectrons@lemmy.world 1 points 2 years ago

There's a lot of interesting detail to your side of the discussion I may not yet have the knowledge of. How does the eye see? We find edges, gradients, repeating patterns which become textures, etc etc... But our systems can be misdirected, see the blue/yellow dress for example. NNsbhave the luxury of being rapidly iterated I guess, compared to our lifespans.

I'm asking questions I don't know answers to here: if the only source of input data for a network is subtly corrupted, won't that guarantee corrupted output as well? I don't see how one can "train out" the corruption which misdirects the network without access to some pristine data.

Don't get me wrong, I'm not naive enough to believe this is foolproof, but I do want to understand why this technique doesn't actually work, and by extension better understand how training a nn actually works.

[-] barsoap@lemm.ee 2 points 2 years ago* (last edited 2 years ago)

if the only source of input data for a network is subtly corrupted, won’t that guarantee corrupted output as well?

We have to distinguish between different kinds of "corruption", here. What you seem to be describing is "if we only feed the model data from rule34, will it ever learn proper human anatomy" and the answer is no, it won't. You'll have to add data which narrows the range of body proportions from cartoonish to, well, real. That's an external source of corruption: Feeding it bad data (for your own definition of "bad"). Garbage in, garbage out.

The corruption that these adversarial models are exploiting though is inherent in the model they're attacking. Take... ropes and snakes and cats (or, generally, mammals). Good example: It is incredibly easy for a cat to mistake a rope for a snake -- it looks exactly the same to the first layers of the visual cortex and evolution would rather have the cat jump away as soon as possible than be bitten, and it doesn't hurt to jump away from a rope (even though the cat might end up being annoyed or ashamed (yes cats can 110% be self-conscious different story)), so when there's an unexpected wiggly shape the first layers directly tell the motor cortex to move, short-circuiting any higher processing.

That trait has been written into the network by evolution, very similar to how we train AI models -- conceptually, that is: In both cases the network gets trained for fitness for a purpose (the implementation details are indeed rather different but also irrelevant):

What those adversarial models do kinda looks like this: Take a picture of a rope. Now randomly shift pixels to make the rope subtly more snake-like until you get your cat to jump as reliably as possible, in as many different situations as possible, e.g. even if they're expecting it and staring straight at it. Sell the product for a lot of money. People start posting pictures of ropes, rope manufacturers adjust their weaving patterns. Other cats see those pictures and ropes, some jump, and others only feel a bit, or a lot, uneasy. The ones that jump will not be able to procreate, any more, being busy jumping, while the uneasy ones will continue to evolve. After a couple of generations no cat cares about those ropes with shifted pixels any more.

Whether that trains general immunity against adversarial attacks -- I wouldn't be so sure. It very likely will make the rope/snake distinction more accurate. But even if it doesn't build general immunity, it's an eternal cat and mouse game and no artist will be willing to continue paying for that kind of software when it's going to get defeated within days, anyway, because that's just how fast we can evolve models.

Oh. Back to the definition of corruption: If all the pictures of rope that our models ever see have shifted pixels then it's just going to assume that is the norm, and distinguish it from snakes because the tags say "rope" in one case, and "snake" in the other. The original un-shifted pictures probably won't be an adversarial attack because they're not a product of trying to get cats to jump.

[-] vidarh@lemmy.stad.social 1 points 2 years ago

Quick iteration is definitely the big thing. (The eye is fun because it's so "badly designed" - we're stuck in a local maxima that just happens to be "good enough" for us to not overcome the big glaring problems)

And yes, if all the inputs are corrupted, the output will likely be too. But 1) they won't all be, and as long as there's a good mix that will "teach" the network over time that the difference between a "corrupted cat" and an "uncorrupted cat" are irrelevant, because both will have most of the same labels associated with them. 2) these tools work by introducing corruption that humans aren't meant to notice, so if the output has the same kind of corruption it doesn't matter. It only matters to the extent the network "miscorrupts" the output in ways we do notice enough so that it becomes a cost drag on training to train it out.

But you can improve on that pretty much with feedback: Train a small network to recognize corruption, and then feed corrupted images back in as negative examples to teach it that those specific things are particularly bad.

Picking up and labelling small sample sets of types of corruption humans will notice is pretty much the worst case realistic effect these tools will end up having. But each such countermeasure will contribute to training sets that make further corruption progressively harder. Ultimately these tools are strictly limited because they can't introduce anything that makes the images uglier to humans, and so you "just" need to teach the models more about the limits of human vision, and in the long run that will benefit the models in any case.

this post was submitted on 23 Oct 2023

569 points (86.3% liked)

Technology

73795 readers

1211 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws