418
submitted 2 days ago* (last edited 2 days ago) by geneva_convenience@lemmy.ml to c/fediverse@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] Canconda@lemmy.ca 15 points 2 days ago

Does this mean that some of the more unhinged users might actually be chat bots? Or are they just scraping our comments reddit style?

[-] mesamunefire@piefed.social 41 points 1 day ago* (last edited 1 day ago)

Scraping by the look of it.

Also if you have ever spun up a lemmy or piefed instance, you will quickly see these bots pop up. They don't respect robots.txt AT ALL. I estimate 95% of the traffic I get on ly tiny little server is all AI crawlers.

A good way to hurt them is to either use cloudflares service or create a page that has a link....to another page that gets generated.....to another page. And each time, it slows down. No human would ever click the link, but bots ALWAYS do. Its so funny to see how many are out there in the quagmire of links on my little python script.

[-] tpyo@lemmy.world 1 points 1 day ago

Does it generate any form of visuals? Like could you post a screenshot of something that shows how far a bot has traveled? I've heard about these traps but I'm curious about what you're describing looks like

[-] mesamunefire@piefed.social 3 points 1 day ago

I just have a id. 1/2.... A href id if that makes sense.

So it's the logs that see the number of iterations. Thousands on a couple of ips. Script kiddies.

Honestly I didn't think the black hole would work that well. But it reduces the actual traffic by a huge factor.

[-] davidgro@lemmy.world 15 points 1 day ago

I assume scraping at this point. There's likely a few hobby ones now, but if Lemmy becomes popular then there will be lots of bots for sure.

[-] zeca@lemmy.ml 5 points 1 day ago

I guess they mostly scrape it. To waste resources posting here they have to find a way to make money in doing so. They put bots posting on facebook because they think it increases user engagement. They dont want to increase engagement on lemmy (not that it would work...).

[-] pelespirit@sh.itjust.works 3 points 1 day ago

There are definitely bots here, but they're scraping too.

this post was submitted on 08 Aug 2025
418 points (99.5% liked)

Fediverse

21110 readers
117 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS