321
rulebots.txt (lemmy.world)
you are viewing a single comment's thread
view the rest of the comments
[-] itsnicodegallo@lemm.ee 8 points 2 months ago

As annoying as this is, it's to prevent LLMs from training themselves using Reddit content, and that's probably the greater of the two evils.

[-] GroupNebula563@lemmy.world 37 points 2 months ago

That's all well and good, but how many LLMs do you think actually respect robots.txt?

[-] colin@lemmy.uninsane.org 14 points 2 months ago

from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.

Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i'd never notice unless i checked the logs, at least.

load more comments (3 replies)
this post was submitted on 21 Aug 2024
321 points (100.0% liked)

196

16490 readers
2780 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago
MODERATORS