127
submitted 15 hours ago* (last edited 15 hours ago) by along_the_road@beehaw.org to c/technology@beehaw.org
you are viewing a single comment's thread
view the rest of the comments
[-] Megaman_EXE@beehaw.org 8 points 5 hours ago

Is there a reason these outages seem to have increased recently?

[-] t3rmit3@beehaw.org 1 points 13 minutes ago* (last edited 13 minutes ago)

From the blog post OP linked in a comment:

We made an unrelated change that caused a similar, longer availability incident two weeks ago on November 18, 2025. In both cases, a deployment to help mitigate a security issue for our customers propagated to our entire network and led to errors for nearly all of our customer base.

It seems that the method they have of specifically propagating new security configurations to their servers is not a gradual or group-based rollout, it pushes certain changes to all servers at once, so uncaught bugs end up hitting everything instead of just some initial test group.

In particular, the projects outlined below should help contain the impact of these kinds of changes:

Enhanced Rollouts & Versioning: Similar to how we slowly deploy software with strict health validation, data used for rapid threat response and general configuration needs to have the same safety and blast mitigation features. This includes health validation and quick rollback capabilities among other things.

"Fail-Open" Error Handling: As part of the resilience effort, we are replacing the incorrectly applied hard-fail logic across all critical Cloudflare data-plane components. If a configuration file is corrupt or out-of-range (e.g., exceeding feature caps), the system will log the error and default to a known-good state or pass traffic without scoring, rather than dropping requests. Some services will likely give the customer the option to fail open or closed in certain scenarios. This will include drift-prevention capabilities to ensure this is enforced continuously.

[-] Blackmist@feddit.uk 1 points 2 hours ago

Lack of NSA funding to run their man in the middle platform that everyone likes.

[-] Kolanaki@pawb.social 6 points 4 hours ago* (last edited 4 hours ago)

Something they (Cloudflare) said recently about the last big outage is that there is some bug in some part of their system that isn't their own code/product and the developer of that thing isn't fixing the bug.

[-] Megaman_EXE@beehaw.org 2 points 4 hours ago

Interesting! Thanks for the information.

[-] kent_eh@lemmy.ca 3 points 5 hours ago* (last edited 2 hours ago)

Without looking into this specific outage, I'd suggest things like deferred maintenance and "cost optimizing" technical staffing are often contributing factors. (At least in my experience)

this post was submitted on 05 Dec 2025
127 points (100.0% liked)

Technology

40851 readers
587 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS