386
submitted 1 day ago* (last edited 1 day ago) by geneva_convenience@lemmy.ml to c/fediverse@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] irotsoma@lemmy.blahaj.zone 31 points 17 hours ago* (last edited 17 hours ago)

I think it's safe to say that all of the LLMs have been training their systems on any site they can get their hands on for some time. That's why apps like Anubis exist trying to keep their crawlers from killing their bandwidth since LLM companies have decided to ignore robots.txt, copyrights, licenses, and other standard practices.

this post was submitted on 08 Aug 2025
386 points (99.7% liked)

Fediverse

21089 readers
636 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS