413
submitted 2 days ago* (last edited 2 days ago) by geneva_convenience@lemmy.ml to c/privacy@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] lazynooblet@lazysoci.al 46 points 2 days ago* (last edited 1 day ago)

My instance gets pillaged once a day for 20 minutes by what I think is a scraper for an LLM.

The scraper grabs every post and profile page and the load on the server triggers alerts but the site stays usable.

I haven't been able to put a stop to it as the requests come from 1500+ IP addresses, with different user agents.

[-] gazby@lemmy.zip 6 points 1 day ago

Run your access logs through something that will report the ASN for the client IPs. Goaccess would be my recommendation. It will require access to a GeoIP database which you can get from Maxmind by signing up for a free API key, or download them directly from P3TERX/GeoLite.mmdb on Github. We have identified a number of bot networks this way. Happy to help further if you'd like a hand ๐Ÿ‘

[-] phoenixz@lemmy.ca 20 points 1 day ago

Yeah, they're scraping alright and it's all purposefully done in such a way that you can't stop it, you can't control it.

AI companies are criminal as far as I am concerned

[-] foremanguy92_@lemmy.ml 26 points 2 days ago
[-] lazynooblet@lazysoci.al 13 points 2 days ago

I have no idea. I spot check 20 or so IP addresses and they are all from different AS networks. Truly diverse botnet. Feel powerless.

[-] cypherpunks@lemmy.ml 42 points 2 days ago

they were suggesting a solution, this proof-of-work web firewall: https://github.com/TecharoHQ/anubis

[-] lazynooblet@lazysoci.al 15 points 2 days ago

Ah thank you, will check it out

[-] Twig@sopuli.xyz 17 points 2 days ago

I think Anubis would be able to prevent that. Sopuli uses it

[-] lazynooblet@lazysoci.al 5 points 2 days ago

Thanks I'll have a look

this post was submitted on 08 Aug 2025
413 points (99.5% liked)

Privacy

40679 readers
439 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago
MODERATORS