121
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 27 Aug 2023
121 points (94.2% liked)
Asklemmy
43989 readers
770 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 5 years ago
MODERATORS
Over a decade ago, I worked in a big tech company that had a scheduled downtime on one Saturday a month. That was for database schema changes.
When you're changing the structure of how you keep track of customer data, you need to make sure that no customers are making changes at that same time. So you take the whole customer-facing service down for a little while, make the schema changes, test them, and then bring the customer-facing service back up. Ideally this takes a few minutes ... but you're prepared for it to take hours.
As the technology improved, and as the developers learned better how to make changes to the system without requiring deep interventions, long downtime for schema changes became less necessary ... for that particular business.
Every tech company pretty much has to learn how to do these sorts of changes for themselves, though.
This is the most informed answer in this thread. It really does come down to schema changes. There are even ways to avoid downtime during schema changes, but it's often complicated. For example, you don't see YouTube go offline for schema changes, but they're willing to make this effort and investment, even for very large databases.
Lots of other database tasks can happen while remaining online. For backups, use a read-only connection. For upgrades, you should have a distributed and scaled database, so take them down in sections during upgrades. For "cleaning up," you can do vacuum operations on part of your database while it's live. Etc etc.
Ultimately, there is almost never a technical reason why a database has to go offline. It's a matter of devotion to the stability and uptime of your infra. Toss enough engineering hours at a database problem and you can pretty much have 100% uptime in the scope of maintenance (not incidents, of course). But even with incidents, there are fail-over plans, replicas, and a ton of other things you can do to stay online. Instead of downtime, you have degraded performance that the users may not even notice.
The other big one that usually requires downtime is network. You may not be touching your game servers all that often but if you need to do a major OS upgrade on a load balancer or switch, that's going to mean everything behind it loses connectivity - and unless you're talking one of the big hitters like WoW, they're probably not funding redundant dual network paths to allow you to take it down without downtime
If you are running metal, and the health of your entire network relies on a single load balancer or a single network switch, you're far from being production-ready from a redundancy and scaling perspective.
I don't disagree, but at the same time running a whole setup that is fully ready for hot swap live failover whenever you have maintenance tasks to do is potentially just not desirable when you have the option of just taking the environment down instead - after all, gamers are pretty much conditioned to expect it at this point
This is basically "ready for production 101." It's even easier to run an entire service on a computer under a desk, but this isn't how you run stuff in production.
Even if it's "easier" in the short term, you'll be paying more for not being production-ready in the long term (and get a reputation for not having good uptime).
Yeah I feel you're widely overestimating the setup that's in place for smaller online games companies. We're not talking about Activision or some high-frequency fixed income trading firm here. "Give me something that people can play on that costs as close to nothing as possible" is usually the main driver
Gross