AI insiders seek to poison the data that feeds them (www.theregister.com)

submitted 2 weeks ago by tonytins@pawb.social to c/technology@lemmy.world

7 comments fedilink hide all child comments

Alarmed by what companies are building with artificial intelligence models, a handful of industry insiders are calling for those opposed to the current state of affairs to undertake a mass data poisoning effort to undermine the technology.

Their initiative, dubbed Poison Fountain, asks website operators to add links to their websites that feed AI crawlers poisoned training data. It's been up and running for about a week.

AI crawlers visit websites and scrape data that ends up being used to train AI models, a parasitic relationship that has prompted pushback from publishers. When scaped data is accurate, it helps AI models offer quality responses to questions; when it's inaccurate, it has the opposite effect.

top 7 comments

sorted by: hot top controversial new old

[-] FauxLiving@lemmy.world 2 points 2 weeks ago* (last edited 2 weeks ago)

If you're interested in like this line of attack, you can also use similar techniques to defeat models that are trained to do object detection (like, for example, the ones that detect the location of your license plate) using adversarial noise attacks.

The short version is, if you have a network that does detection, you can run inference with that network on images that have been altered by another network and have the second network use the confidence of the detection network in its loss function. The second model can be trained to create noise, which looks innocuous to human eyes, that maximally disrupts the segmentation/object detection process of the target/detection network.

You could then print this noise on, say, a transparent overlay and put it on your license plate and automated license plate readers (ALPRs) would not be able to detect/read your plates. Note: Flock is aware of this technique and has lobbied state lawmakers to make putting anything on your plate to disrupt automated reading illegal in some places, check your laws.

Benn Jordan has actually created and trained such a network video here: https://www.youtube.com/watch?v=Pp9MwZkHiMQ

And also uploaded his code, PlateShapez to github: https://github.com/bennjordan

In states where you cannot cover your license plate you're not restricted from decorating the rest of your car. You could use a similar technique to create bumper stickers that are detected as license plates and place them all over your vehicle. Or, even, as Benn suggested, print them with UV ink so they're invisible to humans but very visible to AI cameras who often use UV lamps to provide night vision/additional illumination.

You could also, if you were so inclined, generate bumper stickers or a vinyl wrap which could make the detector be unable to even detect a car.

Adversarial noise attacks are one of the bigger vulnerabilities of AI-based systems and they come in many flavors and can affect anything that uses a neural network.

Another example (also from the video) is that you can encode voice commands in plain audio which, to the user is completely transparent but a device (like Alexa or Siri) will hear it as a specific command ("Hey Siri, unlock the front door"). Any user-generated audio that you encounter online can have this kind of attack encoded in it, the potential damage is pretty limited because AI assistants don't really control critical functions in your life yet... but you should probably not let your assistant listen to TikTok if it can do more than control your home lighting.

[-] algernon@lemmy.ml 2 points 2 weeks ago

I had a short tootstorm about this, because oh my god, this is some terribly ineffective, useless piece of nothing.

For one, Poison Fountain tells us to join the war effort and cache responses. Okay...

❯ curl -i https://rnsaffn.com/poison2/ --compressed -s
HTTP/2 200
content-disposition: inline
content-encoding: gzip
content-type: text/plain; charset=utf-8
x-content-type-options: nosniff
content-length: 959
date: Sun, 11 Jan 2026 21:17:36 GMT

Yeaah... how am I supposed to cache this? Do I cache one response and then continue serving that for the 50+ million crawlers that visit my sites every day? And you think a single, repetitive thing will poison anything at all? Really?

Then, the Poison Fountain explanation goes on to explain that serving garbage to the crawlers will end up in the training data. I'm fairly sure the person who set this up never worked with model training, because this is not what happens. Not even the AI companies are that clueless, they do not train on anything and everything, they do filter it down.

And what this fountain provides, is trivial to filter.

It's also mighty hard to set up! It's not just a reverse_proxy https://rnsaffn.com/posion2, because then you leak all the headers you got. No, you have to make a sanitized request that doesn't leak data. Good luck!

Meanwhile, there are a gazillion of self-hostable garbage generators and tarpits that you can literally shove in a docker container and reverse proxy tarpit URLs to them, safely, locally. Much more efficient, far more effective. And, seeing as this is practically uncacheable, if I were to use it, I'd have to send all the shit that hits my servers, their way. As far as I can tell, this is a single Linode server. It probably wouldn't crumble under my 50 million requests / day, but if ten more people would join the "war effort" without caching, my well educated guess is that it would fall over and die.

Besides, we have no idea whether poisoning works. We can't measure that. What we can measure, is the load on our servers, and this helps fuck all in that regard. The bots will still come, they'll still hit everything, and I'd have additional load due to the network traffic between my server and theirs (remember: the returned response provides no sane indicators that'd allow caching while keeping the responses useful for poisoning purposes).

Not only is this ineffective in poisoning, it's not usable at all in its current state. And they call for joining the war effort. C'mon.

[-] sobchak@programming.dev 1 points 2 weeks ago

I once saw an old lecture where the guy working on Yahoo spam filters noticed that spammers would create accounts to mark their own spam messages as not spam (in an attempt to trick the spam filters; I guess a kind of a Sybil attack), and because the way the SPAM filtering models were created and used, it made the SPAM filtering more effective. It's possible that wider variety of "poisoned" data can actually help improve models.

[-] GhostFish@piefed.social 1 points 2 weeks ago

Considering these AI companies aim to literally poison our water supplies, this seems poetic. Hopefully it is effective.

[-] recursive_recursion@piefed.ca 1 points 2 weeks ago

In addition to poisoning with bad data, I'd recommend adding logic gates where both recipient and sender tests each other in the definition and understanding of trust and consent which is a major thorn against the corporations, CEOs, and conservatives.

[-] SinningStromgald@lemmy.world -1 points 2 weeks ago

Now that news has reported on the website I assume it will get added quickly to do not scrape lists for AI (assuming there is such a thing). So the effectiveness of this will depend on other people adopting this.

[-] floofloof@lemmy.ca 1 points 2 weeks ago

They're recommending not that you link to their URL but that you create a back end that caches content from it and serves that content under your own URLs.

this post was submitted on 11 Jan 2026

11 points (100.0% liked)

Technology

79463 readers

464 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws