116
submitted 3 weeks ago* (last edited 3 weeks ago) by hellinkilla@hexbear.net to c/chapotraphouse@hexbear.net

My commentary:

This is a 1659 page pdf 1 URL per line document, here is where hexbear appears in illustrious context:

onlinecasinorank-kh.com
verkorkst-kreativ-shop.de
demellierlondon.com
www.aprokosailor.com
gabriel.by
hexbear.net
shop.simplefunforkids.com
vdownload-16.sb-cd.com
images.cnwomen.com.cn:80
ftp.pigwa.net
cdn-legacy.iclrs.org

Original post cross-posted from: https://lemmy.ml/post/34374494

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[-] ragebutt@lemmy.dbzer0.com 17 points 3 weeks ago

Facebook will do it if the content is public. The only thing you can do is take the instance private

[-] sexywheat@hexbear.net 16 points 3 weeks ago

CloudFlare made a tool to charge AI bots to browse/scrape your website (not sure how well it works though). However, I don't think HexBear is gonna be using CloudFlare any time soon. But the tech does exist.

[-] ragebutt@lemmy.dbzer0.com 17 points 3 weeks ago

The fact that it’s existence is public means that meta has almost certainly found a way around it

[-] combat_brandonism@hexbear.net 11 points 3 weeks ago

there's also anubis.

this post was submitted on 09 Aug 2025
116 points (100.0% liked)

chapotraphouse

13981 readers
736 users here now

Banned? DM Wmill to appeal.

No anti-nautilism posts. See: Eco-fascism Primer

Slop posts go in c/slop. Don't post low-hanging fruit here.

founded 4 years ago
MODERATORS