41
submitted 9 hours ago by Maroon@lemmy.world to c/opensource@lemmy.ml

I came across tools like nightshade that can poison images. That way, if someone steals an artist's work to train their AI, it learns the wrong stuff and can potentially begin spewing gibberish.

Is there something that I can use on PDFs? There are two scenarios for me:

  1. Content that I already created that is available as a pdf.
  2. I use LaTeX to make new documents and I want to poison those from scratch if possible rather than an ad hoc step once the PDF is created.
you are viewing a single comment's thread
view the rest of the comments
[-] lily33@lemm.ee 1 points 5 hours ago

I don't think any kind of "poisoning" actually works. It's well known by now that data quality is more important than data quantity, so nobody just feeds training data in indiscriminately. At best it would hamper some FOSS AI researchers that don't have the resources to curate a dataset.

[-] Ledivin@lemmy.world 3 points 3 hours ago

At best it would hamper some FOSS AI researchers that don't have the resources to curate a dataset.

If you can't source a dataset, then you shouldn't be researching AI. It's the first and single most important step of the entire process.

this post was submitted on 07 Feb 2025
41 points (87.3% liked)

Open Source

32606 readers
593 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS