386
submitted 1 day ago* (last edited 1 day ago) by geneva_convenience@lemmy.ml to c/fediverse@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] rimu@piefed.social 24 points 1 day ago

Check out the robots.txt on any Lemmy instance....

[-] Pamasich@kbin.earth 5 points 12 hours ago

If they have a brain, and they do have the experience from Threads, they don't need to scrape Lemmy. They can just set up a shell instance, subscribe to Lemmy communities, and then use federation to get their data for free. That doesn't use robots.txt at all regardless.

[-] usernamesAreTricky@lemmy.ml 41 points 1 day ago

Linked article in the body suggests that likely wouldn't have made a difference anyway

The scrapers ignored common web protocols that site owners use to block automated scraping, including “robots.txt” which is a text file placed on websites aimed at preventing the indexing of context

[-] mesamunefire@piefed.social 31 points 1 day ago* (last edited 1 day ago)

Yeah ive seen the argument in blog posts that since they are not search engines they dont need to respect robots.txt. Its really stupid.

[-] AmbitiousProcess@piefed.social 23 points 1 day ago

"No no guys you don't understand, robots.txt actually means just search engines, it totally doesn't imply all automated systems!!!"

[-] belated_frog_pants@beehaw.org 4 points 23 hours ago
[-] rimu@piefed.social 4 points 18 hours ago

Thieves can smash a window to get into my house but I still lock my doors.

[-] belated_frog_pants@beehaw.org 1 points 1 hour ago

This is more like being there when they come to steal and you ask them to ignore some rooms please.

this post was submitted on 08 Aug 2025
386 points (99.7% liked)

Fediverse

21089 readers
636 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS