369
submitted 1 day ago* (last edited 1 day ago) by geneva_convenience@lemmy.ml to c/fediverse@lemmy.ml

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

(page 2) 50 comments
sorted by: hot top controversial new old
[-] anarchiddy@lemmy.dbzer0.com 56 points 21 hours ago

Unpopular opinion but social media has always been fundamentally public.

Unless they're scraping private dm's on encrypted devices, this should come as no surprise to anyone.

The good news is that nobody has exclusive right to data on federated platforms, unlike other sites that will ransom their user's data for private use. Let's not forget that many of us migrated here because the other site wanted to lock down their api and user data so that they could auction it to google for profit.

load more comments (1 replies)
[-] hyacin@lemmy.ml 30 points 19 hours ago

Ahahahahaha, so it's going to be a self-hating Meta AI bot?

[-] Sandouq_Dyatha@lemmy.ml 49 points 21 hours ago

Imagine being a techbro talking to your meta ai chatbot and he says "unlimited genocide on the first world, start jihad on krakkker entity"

[-] sharkfucker420@lemmy.ml 90 points 23 hours ago* (last edited 23 hours ago)

Poison thy well comrades. Become more unhinged /s

[-] NinjaGinga@hexbear.net 21 points 20 hours ago* (last edited 20 hours ago)

Take away that /s, it's praxis now!

load more comments (11 replies)
[-] Carl@hexbear.net 36 points 21 hours ago* (last edited 20 hours ago)

lemmygrad

imagining Zuck launching his "everybody gets ten virtual friends" initiative and accidentally re-radicalizing your parents and grandparents in the other direction.

[-] captainlezbian@lemmy.world 10 points 17 hours ago

Oh that's certainly a decision they made

[-] Gullible@sh.itjust.works 59 points 23 hours ago

I understand why they did it, but scraping a website that freely offers nearly the entirety of its data via federation is a dick move

load more comments (8 replies)
[-] CrispyFern@hexbear.net 44 points 22 hours ago

The bot trained on hexbear and lemmygrad vs the bot trained on .world: approaching-1approaching-2

[-] Ram_The_Manparts@hexbear.net 46 points 22 hours ago
[-] WittyProfileName2@hexbear.net 8 points 15 hours ago

Fuck yeah! My "Bigfoot is actually a big cellar spider and that's why it's always blurry in pictures" theory is gonna be broadcast to everyone's grandmother!

[-] Frogmanfromlake@hexbear.net 21 points 21 hours ago

Lol rip to the AI that trains on my ramblings.

[-] Assian_Candor@hexbear.net 21 points 22 hours ago

Noooo my contentarinos nooooo

load more comments (1 replies)
[-] artifex@piefed.social 50 points 23 hours ago

So every AI’s gonna identify as an Arch user with striped socks now?

[-] oxysis@lemmy.blahaj.zone 30 points 23 hours ago

Forcibly feminizing the ai, one pair of thigh highs at a time

[-] ada@lemmy.blahaj.zone 11 points 22 hours ago

They are scraping the blahaj cdn...

[-] SexUnderSocialism@hexbear.net 29 points 22 hours ago

I'll be upping my use of Maoist Standard English and PIGPOOPBALLS in response this revelation.

load more comments (2 replies)
[-] Alaskaball@hexbear.net 42 points 23 hours ago

Damn zuckbot's gonna end up being a commie-bot that posts absurdist memes about beans if it's harvesting hexbear posts for content

[-] CloutAtlas@hexbear.net 24 points 22 hours ago

The AI wasting hours of processing power having an internal struggle session re: outdoor cats before simply replying with ":pigpoopballs" on a platform that doesn't have that emoji

[-] Maeve@kbin.earth 43 points 23 hours ago

Going straight to palantir

[-] SaneMartigan@aussie.zone 27 points 23 hours ago

now I feel I should upload my asshole pic.

[-] wuphysics87@lemmy.ml 15 points 22 hours ago

Your proctologist already has

load more comments (1 replies)
[-] mesamunefire@piefed.social 25 points 23 hours ago* (last edited 23 hours ago)

Peertube as well. 46 instances.

Oh and https://mastodon.sdf.org as well.

[-] mesamunefire@piefed.social 13 points 23 hours ago

Just fYI: @SDF@mastodon.sdf.org wanted to let you know.

[-] Erika3sis@hexbear.net 24 points 23 hours ago

Honestly, I already figured my posts probably were being used to train a LLM without my consent.

[-] nickwitha_k@lemmy.sdf.org 16 points 22 hours ago

I'm more concerned about the non-consensual scraping causing excess load on the servers. The taking of content without license to train their energy-wasting autocomplete that is being used to for little commercially but to try to cheapen labor and pocket the money is a problem too. But I hate having servers impacted by their bullshit.

load more comments (1 replies)
[-] rimu@piefed.social 23 points 23 hours ago

Check out the robots.txt on any Lemmy instance....

[-] usernamesAreTricky@lemmy.ml 41 points 23 hours ago

Linked article in the body suggests that likely wouldn't have made a difference anyway

The scrapers ignored common web protocols that site owners use to block automated scraping, including “robots.txt” which is a text file placed on websites aimed at preventing the indexing of context

[-] mesamunefire@piefed.social 31 points 23 hours ago* (last edited 23 hours ago)

Yeah ive seen the argument in blog posts that since they are not search engines they dont need to respect robots.txt. Its really stupid.

[-] AmbitiousProcess@piefed.social 23 points 23 hours ago

"No no guys you don't understand, robots.txt actually means just search engines, it totally doesn't imply all automated systems!!!"

load more comments (3 replies)
[-] crazycraw@crazypeople.online 14 points 22 hours ago

I thought we all knew and were training it wrong on purpose..

...as a joke.

[-] BlueEther@no.lastname.nz 20 points 1 day ago* (last edited 23 hours ago)

aussie.zone and beehaw.org are on the list as well

[-] heyWhatsay@slrpnk.net 8 points 20 hours ago

Just make sure to add banana truck to the critical dialogue, and most importantly clown penis.

[-] ada@lemmy.blahaj.zone 12 points 22 hours ago

Our cdn is there... Joy...

[-] socsa@piefed.social 10 points 21 hours ago

Definitely called this. Can we have private voting now? These people are scraping the fediverse and the current state of things is a privacy nightmare.

[-] Deceptichum@quokk.au 14 points 20 hours ago* (last edited 20 hours ago)

You cannot have private voting. The Fediverse is open, that information has to be shared for it to work unless you want to make it more open to vote manipulation.

Even the PieFed implementation wasn’t great, basically giving every user a second account that sends the vote instead.

load more comments (4 replies)
load more comments (1 replies)
[-] v4ld1z@lemmy.zip 16 points 1 day ago

Aw hell nah

[-] Canconda@lemmy.ca 15 points 1 day ago

Does this mean that some of the more unhinged users might actually be chat bots? Or are they just scraping our comments reddit style?

[-] mesamunefire@piefed.social 40 points 23 hours ago* (last edited 23 hours ago)

Scraping by the look of it.

Also if you have ever spun up a lemmy or piefed instance, you will quickly see these bots pop up. They don't respect robots.txt AT ALL. I estimate 95% of the traffic I get on ly tiny little server is all AI crawlers.

A good way to hurt them is to either use cloudflares service or create a page that has a link....to another page that gets generated.....to another page. And each time, it slows down. No human would ever click the link, but bots ALWAYS do. Its so funny to see how many are out there in the quagmire of links on my little python script.

load more comments (4 replies)
[-] davidgro@lemmy.world 15 points 23 hours ago

I assume scraping at this point. There's likely a few hobby ones now, but if Lemmy becomes popular then there will be lots of bots for sure.

load more comments (2 replies)
[-] Photuris@lemmy.ml 9 points 21 hours ago

I hate the internet now

load more comments
view more: ‹ prev next ›
this post was submitted on 08 Aug 2025
369 points (99.7% liked)

Fediverse

21089 readers
625 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS