128
you are viewing a single comment's thread
view the rest of the comments
[-] VitoRobles@lemmy.today 8 points 1 week ago

How exactly does a website stop a web scraper specifically from a org?

I mean isn't that the whole point of web scraping? That if it's publicly available, anybody, including people like ICE, will find a way to get the data?

[-] astronaut_sloth@mander.xyz 13 points 1 week ago

Yeah, it's not technically impossible to stop web scrapers, but it's difficult to have a lasting, effective solution. One easy way is to block their user-agent assuming the scraper uses an identifiable user-agent, but that can be easily circumvented. The also easy and somewhat more effective way is to block scrapers' and caching services' IP addresses, but that turns into a game of whack-a-mole. You could also have a paywall or login to view content and not approve a certain org, but that only will work for certain use cases, and that also is easy to circumvent. If stopping a single org's scraping is the hill to die on, good luck.

That said, I'm all for fighting ICE, even if it's futile. Just slowing them down and frustrating them is useful.

this post was submitted on 24 Mar 2025
128 points (100.0% liked)

Technology

38432 readers
357 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS