419
submitted 2 days ago* (last edited 2 days ago) by geneva_convenience@lemmy.ml to c/fediverse@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] Pamasich@kbin.earth 5 points 1 day ago

If they have a brain, and they do have the experience from Threads, they don't need to scrape Lemmy. They can just set up a shell instance, subscribe to Lemmy communities, and then use federation to get their data for free. That doesn't use robots.txt at all regardless.

this post was submitted on 08 Aug 2025
419 points (99.5% liked)

Fediverse

21127 readers
22 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS