170
submitted 1 day ago* (last edited 1 day ago) by throws_lemy@lemmy.nz to c/linux@programming.dev

LLM scrapers are taking down FOSS projects' infrastructure, and it's getting worse.

you are viewing a single comment's thread
view the rest of the comments
[-] grrgyle@slrpnk.net 59 points 1 day ago* (last edited 1 day ago)

Wow that was a frustrating read. I dd not know it was quite that bad. Just to highlight one quote

they don’t just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not. They also don’t give a single flying fuck about robots.txt, because why should they. [...] If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really). This is literally a DDoS on the entire internet.

[-] jatone@lemmy.dbzer0.com 24 points 1 day ago

the solution here is to require logins. thems the breaks unfortunately. it'll eventually pass as the novelty wears off.

[-] possiblylinux127@lemmy.zip 6 points 22 hours ago

Alternative: require a proof of work calculation.

This is exactly what we need to do. You'd think that a FOSS WAF exists out there somewhere that can do this

[-] LiveLM@lemmy.zip 2 points 19 hours ago

There is. That screenshot you see in the article is a picture of a brand new one, Anubis

Yeah I realised that after posting. I think we need a better one to deal with the cases of letting legitimate users in easier though

[-] possiblylinux127@lemmy.zip 1 points 17 hours ago

It kind of sucks but it is the best we have for the moment

load more comments (3 replies)
load more comments (8 replies)
load more comments (8 replies)
this post was submitted on 21 Mar 2025
170 points (98.9% liked)

Linux

6583 readers
322 users here now

A community for everything relating to the GNU/Linux operating system

Also check out:

Original icon base courtesy of lewing@isc.tamu.edu and The GIMP

founded 2 years ago
MODERATORS