429
submitted 4 days ago* (last edited 4 days ago) by geneva_convenience@lemmy.ml to c/fediverse@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] scintilla@crust.piefed.social 8 points 4 days ago

Can someone explain why they would need to scrape multiple instances? Are they intentionally going after the fediverse or is it just a byproduct of meta trying to get all of human communication?

[-] wuphysics87@lemmy.ml 7 points 4 days ago

The second one

[-] BlueEther@no.lastname.nz 4 points 4 days ago

probably the latter

[-] frongt@lemmy.zip 3 points 4 days ago

It's a lot easier for them to use the same scraper they use on other sites than to build something custom.

[-] LustyArgonianMana@lemmy.world 2 points 4 days ago

Fascism, control, having the money to trawl through less popular socials to find dissidents

[-] halcyoncmdr@lemmy.world 1 points 4 days ago

Instances will not have copies of content for instances they block. So while Meta has Threads... most of the fediverse has blocked it. Since they can't get that data fia federation, they scrape. And the instances they scrape will also only have content from their unblocked instances. To ensure they get everything, they have to scrape everything regardless of federation.

this post was submitted on 08 Aug 2025
429 points (99.5% liked)

Fediverse

21155 readers
14 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS