16

Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I'm afraid Anubis will be outdated soon and we need something else.

top 50 comments
sorted by: hot top controversial new old
[-] rtxn@lemmy.world 6 points 3 months ago* (last edited 3 months ago)

The current version of Anubis was made as a quick "good enough" solution to an emergency. The article is very enthusiastic about explaining why it shouldn't work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.

The purpose is to reduce the flood to a manageable level, not to block every single scraper request.

[-] AnUnusualRelic@lemmy.world 1 points 3 months ago

The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what's currently asked, but it's a balancing act before it starts to really be an annoyance for the meat popsicle users.

[-] rtxn@lemmy.world 1 points 3 months ago

That's why the developer is working on a better detection mechanism. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/

[-] poVoq@slrpnk.net 0 points 3 months ago* (last edited 3 months ago)

And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.

I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦

[-] interdimensionalmeme@lemmy.ml 0 points 3 months ago

Unless you have a dirty heatsink, no amount of hammering would make the server overheat

[-] poVoq@slrpnk.net 1 points 3 months ago

Are you explaining my own server to me? 🙄

[-] interdimensionalmeme@lemmy.ml 0 points 3 months ago

What CPU do you have made after 2004 that doesn't have automatic temperature control ?
I don't think there is any, unless you somehow managed to disable it ?
Even a raspberry pi without a heatsink won't overheat to shutdown

[-] poVoq@slrpnk.net 1 points 3 months ago

You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can't even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.

[-] mobotsar@sh.itjust.works 0 points 3 months ago

Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?

[-] poVoq@slrpnk.net 2 points 3 months ago

Yes, because Cloudflare routinely blocks entire IP ranges and puts people into endless captcha loops. And it snoops on all traffic and collects a lot of metadata about all your site visitors. And if you let them terminate TLS they will even analyse the passwords that people use to log into the services you run. It's basically a huge survelliance dragnet and probably a front for the NSA.

[-] bjoern_tantau@swg-empire.de 0 points 3 months ago

Cloudflare would need https keys so they could read all the content you worked so hard to encrypt. If I wanted to do bad shit I would apply at Cloudflare.

[-] mobotsar@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago)

Maybe I'm misunderstanding what "behind cloudflare" means in this context, but I have a couple of my sites proxied through cloudflare, and they definitely don't have my keys.

I wouldn't think using a cloudflare captcha would require such a thing either.

[-] bjoern_tantau@swg-empire.de 2 points 3 months ago* (last edited 3 months ago)

Hmm, I should look up how that works.

Edit: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/#custom-ssltls

They don't need your keys because they have their own CA. No way I'd use them.

Edit 2: And with their own DNS they could easily route any address through their own servers if they wanted to, without anyone noticing. They are entirely too powerful. Is there some way to prevent this?

[-] starkzarn@infosec.pub 1 points 3 months ago

That's because they just terminate TLS at their end. Your DNS record is "poisoned" by the orange cloud and their infrastructure answers for you. They happen to have a trusted root CA so they just present one of their own certificates with a SAN that matches your domain and your browser trusts it. Bingo, TLS termination at CF servers. They have it in cleartext then and just re-encrypt it with your origin server if you enforce TLS, but at that point it's meaningless.

[-] moseschrute@crust.piefed.social 0 points 3 months ago

Out of curiosity, what’s the issue with Cloudflair? Aside from the constant worry they may strong arm you into their enterprise pricing if you’re site is too popular lol. I understand support open source, but why not let companies handle the expensive bits as long as they’re willing?

I guess I can answer my own question. If the point of the Fediverse is to remove a single point of failure, then I suppose Cloidflare could become a single point to take down the network. Still, we could always pivot away from those types of services later, right?

load more comments (1 replies)
load more comments (8 replies)
[-] rtxn@lemmy.world 4 points 3 months ago

New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. https://www.theregister.com/2025/08/21/ai_crawler_traffic/

Anubis' developer was interviewed and they posted the responses on their website: https://xeiaso.net/notes/2025/el-reg-responses/

In particular:

Fastly's claims that 80% of bot traffic is now AI crawlers

In some cases for open source projects, we've seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It's nuts.

So, yeah. If we believe Xe, OOP's article is complete hogwash.

[-] unexposedhazard@discuss.tchncs.de 4 points 3 months ago

This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.

Well it doesnt fucking matter what "makes sense to you" because it is working...
Its being deployed by people who had their sites DDoS'd to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?

[-] Klear@quokk.au 2 points 3 months ago* (last edited 3 months ago)

If that sounds familiar, it’s because it’s similar to how bitcoin mining works. Anubis is not literally mining cryptocurrency, but it is similar in concept to other projects that do exactly that

Did the author only now discover cryptography? It's like a cryptocurrency, just without currency, what a concept!

[-] SkaveRat@discuss.tchncs.de 2 points 3 months ago

It's a perfectly valid way to explain it, though

If you try to show up with "cryptography" as an explanation, people will think of encrypting messages, not proof of work

"Cryptocurrency with the currency" really is the perfect single sentence explanation

[-] VitabytesDev@feddit.nl 1 points 3 months ago

I love that domain name.

[-] CrackedLinuxISO@lemmy.dbzer0.com 1 points 3 months ago* (last edited 3 months ago)

There are some sites where Anubis won't let me through. Like, I just get immediately bounced.

So RIP dwarf fortress forums. I liked you.

[-] sem@lemmy.blahaj.zone 2 points 3 months ago

I don't get it, I thought it allows all browser with JavaScript enabled.

[-] TwiddleTwaddle@lemmy.blahaj.zone 1 points 3 months ago

I'm constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.

[-] Dremor@lemmy.world 1 points 3 months ago

Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.

[-] tofu@lemmy.nocturnal.garden 0 points 3 months ago

No, it works because the scraper bots don't have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.

[-] Dremor@lemmy.world 1 points 3 months ago

To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.

But despite that, it has its detractors, like any solution that becomes popular.

But let's be honest, what are the arguments against it?
It takes a bit longer to access for the first time? Sure, but that's not like you have to click anything or write anything.
It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I'd be surprised if even a handful does.

The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.

[-] EncryptKeeper@lemmy.world 1 points 3 months ago* (last edited 3 months ago)

The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.

Some bots don’t use JavaScript and can’t solve the challenges and so they’d be blocked, but there was never any point in time where no scrapes could solve them.

[-] ryannathans@aussie.zone 0 points 3 months ago

Yeah has seemed like a bit of a waste of time, once that difficulty gets scaled up and expiration down it's gonna get annoying to use the web on phones

[-] non_burglar@lemmy.world 1 points 3 months ago

I had to get my glasses to re-read this comment.

You know why anubis is in place on so many sites, right? You are literally blaming the victims for the absolute bullshit AI is foisting on us all.

[-] ryannathans@aussie.zone 0 points 3 months ago

Yes, I manage cloudflare for a massive site that at times gets hit with millions of unique bot visits per hour

[-] non_burglar@lemmy.world 0 points 3 months ago

So you know that this is the lesser of the two evils? Seems like you're viewing it from client's perspective only.

No one wants to burden clients with Anubis, and Anubis shouldn't exist. We are all (server operators and users) stuck with this solution for now because there is nothing else at the moment that keeps these scrapers at bay.

Even the author of Anubis doesn't like the way it works. We all know it's just more wasted computing for no reason except big tech doesn't give a care about anyone.

[-] ryannathans@aussie.zone 1 points 3 months ago

My point is, and the author's point is, it's not computation that's keeping the bots away right now. It's the obscurity and challenge itself getting in the way.

[-] possiblylinux127@lemmy.zip 0 points 3 months ago* (last edited 3 months ago)

Anubis sucks

However, the number of viable options is limited.

[-] seralth@lemmy.world 1 points 3 months ago

Yeah but at least Anubis is cute.

I'll take sucks but cute over dead internet and endless swarmings of zergling crawlers.

load more comments
view more: next ›
this post was submitted on 21 Aug 2025
16 points (100.0% liked)

Selfhosted

53727 readers
103 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

  7. No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS