192
submitted 1 day ago* (last edited 1 day ago) by throws_lemy@lemmy.nz to c/linux@programming.dev

LLM scrapers are taking down FOSS projects' infrastructure, and it's getting worse.

you are viewing a single comment's thread
view the rest of the comments
[-] refalo@programming.dev 3 points 1 day ago* (last edited 1 day ago)

I don't like the approach of banning nonresidential IPs. I think it's discriminatory and unfairly blocks out corporate/VPN users and others we might not even be thinking about. I realize there is a bot problem but I wish there was a better solution. Maybe purely proof-of-work solutions will get more popular or something.

[-] sudo@programming.dev 0 points 1 day ago

Proof of Work is a terrible solution because it assumes computational costs are significant expense for scrapers compared to proxy costs. It'll never come close to costing the same as residential proxies and meanwhile every smartphone user will be complaining about your website draining their battery.

You can do something like only challenge data data center IPs but you'll have to do better than Proof-of-Work. Canvas fingerprinting would work.

[-] refalo@programming.dev 2 points 9 hours ago

Proof of Work is a terrible solution

Hard disagree, because:

it assumes computational costs are significant expense for scrapers compared to proxy costs

The assumption is correct. PoW has been proven to significantly reduce bot traffic... meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.

Canvas fingerprinting would work.

Demonstrably false... people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks

[-] sudo@programming.dev 1 points 19 minutes ago

The assumption is correct. PoW has been proven to significantly reduce bot traffic.

What you're doing is filtering out bots that can't be bothered to execute JavaScript. You don't need to do a computational heavy PoW task to do that.

meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.

Correct, and thats why they are the number one expense for any scraping company. Any scraper that can't be bothered to spin up a headless browser isn't going to cough up the dough for residential proxies.

Demonstrably false… people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks

That's not what "demonstrably false" even means. Canvas fingerprinting filters out bots better than PoW. What you're complaining about too strict settings and some users being denied. Make your Anubis settings too high you'll have users waiting long times while their batteries drain.

this post was submitted on 21 Mar 2025
192 points (99.0% liked)

Linux

6596 readers
287 users here now

A community for everything relating to the GNU/Linux operating system

Also check out:

Original icon base courtesy of lewing@isc.tamu.edu and The GIMP

founded 2 years ago
MODERATORS