41
submitted 3 months ago by plinky@hexbear.net to c/technology@hexbear.net

respecting robots.txt won't last long i suspect

you are viewing a single comment's thread
view the rest of the comments
[-] nat_turner_overdrive@hexbear.net 10 points 3 months ago

While many companies respect robots.txt instructions, some do not. Several large companies, including Perplexity, have been caught using proxies and surreptitious user agents to circumvent and ignore robots.txt.

respecting robots.txt already died

[-] thethirdgracchi@hexbear.net 8 points 3 months ago

robots.txt has been dead for over a decade. Massive web scraping is big business at this point.

[-] nat_turner_overdrive@hexbear.net 8 points 3 months ago

For sure, I just mean in the context of AI since OP suggested it wouldn't be respected much longer

this post was submitted on 23 Jul 2024
41 points (100.0% liked)

technology

23308 readers
147 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS