75
submitted 1 day ago* (last edited 1 day ago) by db0@lemmy.dbzer0.com to c/div0@lemmy.dbzer0.com

After yet another bot scraping wave forcing me to do sysadmin work at 3am, and me ranting about it on lemmy, @self@awful.systems linked me to a post that referenced iocaine which sounded a perfect way to get back on bots that don't respect our resources and time.

At the same time, we recently on-boarded @tenchiken@lemmy.dbzer0.com as an extra sysadmin to reduce the "bus factor" of our instance (say hello), and they graciously offered some spare compute they had lying around. So I thought, since serving iocaine to bots doesn't really require any serious uptime, why not put it those resources to good use.

So after a couple of hours messing up with things, I've now deployed iocaine to protect our instance as well as fediseer. This should hopefully start messing back with these bastards by serving them some surrealistic nonsense I had squirreled away.

If you want to see this in action, set your user agent to GPTBot and visit our instance. If you find yourself trapped in iocaine somehow, just let us know.

top 16 comments
sorted by: hot top controversial new old
[-] naevaTheRat@lemmy.dbzer0.com 24 points 1 day ago

So cool that you have to booby trap sites and insert delays and stuff because a handful of rich psychopaths refuse to respect robots.txt

[-] tenchiken@lemmy.dbzer0.com 31 points 1 day ago

/me waves in awkward sysadmin fashion

Looking forward to nonsense gibberish on the Internet being put to good use instead of politics!

[-] SnokenKeekaGuard@lemmy.dbzer0.com 6 points 1 day ago* (last edited 1 day ago)

Well if you like nonsense gibberish.

!aneurysmposting@sopuli.xyz

And welcome aboard!

[-] flicker@lemmy.dbzer0.com 5 points 1 day ago

Welcome aboard!

[-] SerotoninSwells@lemmy.world 8 points 1 day ago

๐Ÿ‘‹ Hi, I really hope iocaine works for you and I think it still might be wise to temper expectations. Some background, I work in bot detection and mitigation.

I quickly tried reading through their code and documentation but I don't see the main detection mechanism that determines human vs bot other than what you mentioned as an example. If it's user agent based, it is trivially easy to spoof as you already know. I am finding in my work that these companies do not keep the user agent they report in their documentation when challenged.

My second concern was the page the reverse proxy served when spoofing my user agent. The DOM was nowhere close to that of Lemmy and I think it's important to point out that a simple check for specific elements on the page will keep the bot from poisoning itself.

I admit I could be too close to this problem to see other solutions, and I really hope it works. It sucks that this is a problem. I wish there were more open source options too.

If for some reason this solution doesn't work, and if anyone is interested in help, I am more than happy to freely offer my knowledge.

[-] db0@lemmy.dbzer0.com 6 points 1 day ago

Thanks. Iocaine doesn't do detection, it only does the poisoning. The detection is currently manual. We do it based on agents and ip ranges. These bots are extraordinarily stupid atm, which is what is the biggest issue. The ones causing us down times were hitting obsolete domains and stupid links constantly. They are very very crude. They are not sophisticated yet to check DOM but they can tell when they've been blocked and switch to proxies. Sending them to iocaine is meant to not let them realize they're blocked.

Obviously someone smart can easily defeat it, even by just respecting our resources. But these fuckers are very greedy atm. We'll have to evolve along with them.

[-] tenchiken@lemmy.dbzer0.com 6 points 1 day ago

For what it's worth, this is just damage control and first step. Deployment was trivial compared to most other ideas, so it seemed worth at least giving a go.

Our expectations are very much tempered, but trying to be optimistic on even a small reprieve.

Thanks for the Dom detail!

[-] SerotoninSwells@lemmy.world 3 points 1 day ago

I feel you and feel for you. I really do hope you get a reprieve because dealing with this is nonsense.

[-] fxomt@lemmy.dbzer0.com 14 points 1 day ago

Nice :D i thought of this before but didn't know there was a software for it, it looks great. Glad we're using it, fuck AI crawlers.

[-] flicker@lemmy.dbzer0.com 8 points 1 day ago

Good looking out! Appreciate it!

Also, anything that adds surreal nonsense is something I support, especially using it on bots.

[-] mindbleach@sh.itjust.works 3 points 1 day ago

(say hello)

Arise, Chicken.

[-] Andromxda@lemmy.dbzer0.com 1 points 1 day ago
[-] Blaze@lemmy.dbzer0.com 4 points 1 day ago
[-] wizardbeard@lemmy.dbzer0.com 2 points 1 day ago

Just idle curiousity: Any particular reason for iocaine vs. any of the other similar projects (found this list on the iocaine homepage) out there?

[-] db0@lemmy.dbzer0.com 4 points 1 day ago* (last edited 1 day ago)

Just first I run into and was easy to deploy

this post was submitted on 17 May 2025
75 points (98.7% liked)

/0

1615 readers
16 users here now

Meta community. Discuss about this lemmy instance or lemmy in general.

Service Uptime view

founded 2 years ago
MODERATORS