view the rest of the comments
Linux
Welcome to c/linux!
Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!
Rules:
-
Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.
-
Be respectful: Treat fellow community members with respect and courtesy.
-
Quality over quantity: Share informative and thought-provoking content.
-
No spam or self-promotion: Avoid excessive self-promotion or spamming.
-
No NSFW adult content
-
Follow general lemmy guidelines.
Then what are they doing? It seems very cumbersome to have to take a drive offline for routine maintenance.
They don’t do anything.
They have lots and lots of redundancy, and when enough drive fails, they decommission the entire server and/or rack.
Them big players play at a very different scale than the rest of us.
Hardware-backed RAID, with error monitoring and patrol read. iSCSI or similar to present that to a virtualization layer. VMFS or similar atop that. Files atop that to represent virtual drives. Virtual machines atop that.
Patrol read starts catching errors long before SMART will. Those drives get replicated to (and replaced by) hot spares, online. Failing drives then get replaced with new hot spares.
But all of that is irrelevant, because at the enterprise level, they are scaling their applications horizontally, with distributed containers. So even if they needed to do fsck at the guest filesystem level (or even if they weren't using virtualization) they would just redeploy the containers to a different node and then direct traffic away from the one that needs the maintenance.
We don't do maintenance, we just have redundancy, and backups, then replace failed components.