927
submitted 9 months ago* (last edited 9 months ago) by shootwhatsmyname@lemm.ee to c/technology@lemmy.world
you are viewing a single comment's thread
view the rest of the comments
[-] cbarrick@lemmy.world 72 points 9 months ago

Pouring one out for the SREs at Meta

[-] kinther@lemmy.world 41 points 9 months ago

Someone is having a really bad day today. I wonder if your phone dies when you get a certain number of pages or push notifications

[-] Alk@lemmy.world 41 points 9 months ago* (last edited 9 months ago)

Fun story. I had a flip phone years ago and you could have multiple recipients to a single text. And if the text was multiple pages, it would split into several texts. And you could resend already sent texts.

So one time I put in my girlfriend's phone number in all 20 recipient slots. I then filled the text to the max size, though I don't remember how many it split into. I then resent it over and over. This all took like 2 or 3 minutes.

Her phone was sending notifications over and over for the entire rest of the day. I'd guess at least 8 hours, probably more.

[-] 18_24_61_b_17_17_4@lemmy.world 10 points 9 months ago

Fucking hell I used to love doing that! Man that brought back some memories. Would do it to my co-worker and just piss myself laughing.

[-] lightnegative@lemmy.world 1 points 8 months ago

The classic txt bomb. I use to do this if I had unused txts at the end of the month

[-] maynarkh@feddit.nl 5 points 9 months ago

No but it's unusable. I had a weird bug on one of my phones that sent an SMS over as fast as it could as long as the phone was on. I wrote the initial SMS, the contents were something like "hey, wanna hang?", and the poor guy on the other side was blasted for several hours of literally constant notifications.

Luckily my plan at the time had unlimited free SMS.

[-] alilbee@lemmy.world 24 points 9 months ago

Looking at the downmeter shot someone posted above, it's half the SREs in the country. Not sure what the root cause will be, but damn that's a lot of money down the tubes. I would not want to be the person who cost Meta and Google their precious thirty 9's of availability lol.

[-] ObviouslyNotBanana@lemmy.world 5 points 9 months ago
[-] SpaceNoodle@lemmy.world 9 points 9 months ago
[-] marcos@lemmy.world 3 points 9 months ago

Nah, what was that muddy country from Dilbert?

[-] PlutoniumAcid@lemmy.world 10 points 9 months ago
[-] alilbee@lemmy.world 3 points 9 months ago

The country where all of those services are maintained and hosted in... Just colloquial shorthand, not trying to be exclusionary.

[-] ObviouslyNotBanana@lemmy.world 2 points 9 months ago

Ok thanks for the clarification!

[-] merc@sh.itjust.works 1 points 9 months ago

The country where all of those services are maintained and hosted in…

For Meta, Google, etc. that's a number of countries all over the world.

[-] alilbee@lemmy.world 1 points 9 months ago* (last edited 9 months ago)

That's fair. Yall, I was really not trying to be shitty. It was just shorthand I used, thinking of their HQs. No ill intent intended and I apologize for any harm it caused.

[-] don@lemm.ee 2 points 9 months ago
[-] merc@sh.itjust.works 3 points 9 months ago

It's likely there's a root cause, like a fiber cut or some other major infrastructure issue. But, Down Detector doesn't really put a scale on their graphics, so it could be that it's a huge issue at Meta and a minor issue that's just noticeable for everyone else. In that case, Meta could be the root cause.

If everyone is mailing themselves their passwords, shutting their phones on and off, restarting their browsers, etc. because Meta wasn't working, it could have knock-on effects for everyone else. Could also be that because Meta is part of the major ad duopoly, the issue affected their ad system, which affected everyone interacting with a Meta ad, which is basically everyone.

[-] alilbee@lemmy.world 2 points 9 months ago

I've been an SRE for a few large corps, so I've definitely played this game. I'm with you that it was likely just the FB identity or ad provider causing most of these issues. So glad I'm out of that role now and back to DevOps, where I'm no longer on call.

[-] merc@sh.itjust.works 1 points 8 months ago

Yeah. And when the outage is due to something external, it's not too stressful. As long as you don't have absolutely insane bosses, they'll understand that it's out of your control. So, you wait around for the external system to be fixed, then check that your stuff came back up fine, and go about your day.

I personally liked being on call when the on-call compensation was reasonable. Like, on-call for 2 12-hour shifts over the weekend? 2 8-hour days off. If you were good at maintaining your systems you had quiet on-call shifts most of the time, and you'd quickly earn lots of days off.

[-] alilbee@lemmy.world 1 points 8 months ago

Yeah I'd be less worried about internal pressures (which should be minimal at a halfway decently run org) and more about the externals. I don't think you would actually end up dealing with anything, but I'd know those reliant huge corps are pissed.

Man, your on-call situation sounds rad! I was salaried and just traded off on-call shifts with my team members, no extra time off. Luckily though, our systems were pretty quiet so it hardly ever amounted to much.

[-] merc@sh.itjust.works 1 points 8 months ago

I think you want people to want to be on call (or at least be willing to be on call). There's no way I'd ever take a job where I was on-call and not compensated for being on-call. On-call is work. Even if nothing happens during your shift, you have to be ready to respond. You can't get drunk or get high. You can't go for a hike. You can't take a flight. If you're going to be so limited in what you're allowed to do, you deserve to be compensated for your time.

But, since you're being compensated, it's also reasonable that you expect to have to respond to something. If your shifts are always completely quiet, either you or the devs aren't adding enough new features, or you're not supervising enough services. You should have an error budget, and be using that error budget. Plus, if you don't respond to pages often enough, you get rusty, so when there is an event you're not as ready to handle it.

[-] guacupado@lemmy.world 1 points 8 months ago* (last edited 8 months ago)

Second half is the closest answer in this thread.

[-] Semi-Hemi-Demigod@kbin.social 2 points 8 months ago

Hopefully they won't need to cut their way into the data center this time.

[-] ALostInquirer@lemm.ee 1 points 8 months ago
[-] el_abuelo@lemmy.ml 3 points 8 months ago
[-] lightnegative@lemmy.world 1 points 8 months ago

Google terminology leaking its way into mainstream

[-] cbarrick@lemmy.world 1 points 8 months ago
[-] lightnegative@lemmy.world 1 points 8 months ago

Lots of places have SRE now, thanks to Google. Like I said, google thing leaked to mainstream

this post was submitted on 05 Mar 2024
927 points (97.5% liked)

Technology

59708 readers
1447 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS