948

Lemmy.world status update 2023-07-05 (lemmy.world)

submitted 2 years ago* (last edited 2 years ago) by ruud@lemmy.world to c/lemmyworld@lemmy.world

195 comments fedilink hide all child comments

Another day, another update.

More troubleshooting was done today. What did we do:

Yesterday evening @phiresky@phiresky@lemmy.world did some SQL troubleshooting with some of the lemmy.world admins. After that, phiresky submitted some PRs to github.
@cetra3@lemmy.ml created a docker image containing 3PR's: Disable retry queue, Get follower Inbox Fix, Admin Index Fix
We started using this image, and saw a big drop in CPU usage and disk load.
We saw thousands of errors per minute in the nginx log for old clients trying to access the websockets (which were removed in 0.18), so we added a return 404 in nginx conf for /api/v3/ws.
We updated lemmy-ui from RC7 to RC10 which fixed a lot, among which the issue with replying to DMs
We found that the many 502-errors were caused by an issue in Lemmy/markdown-it.actix or whatever, causing nginx to temporarily mark an upstream to be dead. As a workaround we can either 1.) Only use 1 container or 2.) set ~~proxy_next_upstream timeout;~~ max_fails=5 in nginx.

Currently we're running with 1 lemmy container, so the 502-errors are completely gone so far, and because of the fixes in the Lemmy code everything seems to be running smooth. If needed we could spin up a second lemmy container using the ~~proxy_next_upstream timeout;~~ max_fails=5 workaround but for now it seems to hold with 1.

Thanks to @phiresky@lemmy.world , @cetra3@lemmy.ml , @stanford@discuss.as200950.com, @db0@lemmy.dbzer0.com , @jelloeater85@lemmy.world , @TragicNotCute@lemmy.world for their help!

And not to forget, thanks to @nutomic@lemmy.ml and @dessalines@lemmy.ml for their continuing hard work on Lemmy!

And thank you all for your patience, we'll keep working on it!

Oh, and as bonus, an image (thanks Phiresky!) of the change in bandwidth after implementing the new Lemmy docker image with the PRs.

Edit So as soon as the US folks wake up (hi!) we seem to need the second Lemmy container for performance. So that's now started, and I noticed the proxy_next_upstream timeout setting didn't work (or I didn't set it properly) so I used max_fails=5 for each upstream, that does actually work.

top 50 comments

sorted by: hot top controversial new old

[-] kionay@lemmy.world 113 points 2 years ago

You guys had better quit it with all this amazing transparency or it's going to completely ruin every other service for me. Seriously though amazing work and amazing communication.

[-] phiresky@lemmy.world 112 points 2 years ago* (last edited 2 years ago)

server load is too low, everyone upvote more stuff so i can optimize more

edit: guess there is some more work to be done 😁

[-] woelkchen@lemmy.world 21 points 2 years ago

Upvote causes an endless spinner on Liftoff. 😁

load more comments (4 replies)

[-] marsokod@lemmy.world 13 points 2 years ago

I don't understand your graph. It says you are measuring gigabit/sec but shouldn't the true performance rating be gigabeans/sec for a Lemmy instance?

load more comments (4 replies)

load more comments (6 replies)

[-] Spectator@lemmy.world 80 points 2 years ago

I'm not sure wtf you just said, but lemmy.world feels very smooth today, so thank you for your continued hard work!

[-] Rootiest@lemmy.world 42 points 2 years ago

Test:

Upvote if you can see this comment. 👍

load more comments (9 replies)

[-] 0235@lemmy.world 38 points 2 years ago

Appreciate that these updates use the yyyy-mm-dd format :D

[-] DreamlandLividity@lemmy.world 11 points 2 years ago* (last edited 2 years ago)

ISO-8601. The only correct format!

load more comments (3 replies)

[-] radfordhound@programming.dev 34 points 2 years ago

It's so smooth now; the speed difference is insane! You all are doing excellent work!

[-] xavier666@lemm.ee 33 points 2 years ago

Even though i'm not from this instance, this is such a nice way of keeping the users posted about changes. I wish more companies (I know this is not a company) went straight to the point, instead of using vague terms like "improved stability, fixed few issues with an update" when things are changed. I hope all instance owners follow this trend.

[-] ruud@lemmy.world 13 points 2 years ago

The owner of your instance has been a big help. You've also chosen a good instance!

[-] xavier666@lemm.ee 10 points 2 years ago

@sunaurus@lemm.ee is awesome. He keep us aware of what's happening, planned maintenance hours, etc. His commits on making lemmy scale horizontally is what kept lemm.ee snappy even when we had a huge influx of users. I hope Lemmy as a whole continues this ethos of collaboration.

load more comments (1 replies)

[-] ekZepp@lemmy.world 29 points 2 years ago

[-] isaachernandez@lemmy.world 25 points 2 years ago

The change is noticeable. Good job guys.

Thanks for the updates.

load more comments (3 replies)

[-] MetricExpansion@lemmy.world 24 points 2 years ago

I'm very curious: does single Lemmy instance have the ability to horizontally scale to multiple machines? You can only get so big of a machine. You did mention a second container, so that would suggest that the Lemmy software is able to do so, but I'm curious if I'm reading that right.

[-] DoomBot5@lemmy.world 20 points 2 years ago

A single instance, no. You run multiple instances on multiple machines, then put a frontend (nginx in this case) to distribute the traffic among them.

load more comments (5 replies)

[-] dyslexicdainbroner@lemmy.world 23 points 2 years ago

How great is it to be a part of history in the making -

This is Web 3 in its fomenting -

Headlines ~5yrs:

The ending of Web 2 was unceremonious and just ugly. u/spez and moron@musk watched as their social media networks signaled the end of Web 2 and slowly dissolved. Blu bird’s value disintegrated and Reddit’s hopes for IPO did likewise. Twitter and Reddit dissolved into odorous flatulence as centralization fell apart to the world’s benefit. Decentralized/federated social media such as Mastodon and Lemmy made their convoluted progress and led Web 3’s development and growth…

This is how history is made, it’s ugly and convoluted but comes out sweeet…

[-] KSPAtlas@sopuli.xyz 21 points 2 years ago

Shouldn't the correct HTTP status code for a removed API be 410? 404 indicates the domain wasn't found or doesn't exist, 410 indicates a resource being removed

[-] Hupf@feddit.de 13 points 2 years ago

Or 418 for the wrong API being used :^)

load more comments (2 replies)

[-] Kodiack@lemmy.world 21 points 2 years ago* (last edited 2 years ago)

Awesome work - things seem to be running much more smoothly today.

Do you have anything behind CDN by chance? Looking at the lemmy.world IPs, the server appears to be hosted in Europe and web traffic goes directly there? IPv4 apparently seems to be resolving to a Finland-based address, and IPv6 apparently seems to be resolving to a Germany-based address.

If you put the site behind a CDN, it should significantly reduce your bandwidth requirements and greatly drop the number of requests that need to hit the origin server. CDNs would also make content load faster for people in other parts of the world. I'm in New Zealand, for example, and I'm seeing 300-350 ms latency to lemmy.world currently. If static content such as images could be served via CDN, that would make for a much snappier browsing experience.

[-] ruud@lemmy.world 13 points 2 years ago

Yes that's one of the things on our To Do list

load more comments (3 replies)

[-] pathief@lemmy.world 21 points 2 years ago* (last edited 2 years ago)

Is it safe to use 2FA yet?

[-] ruud@lemmy.world 8 points 2 years ago

It doesn't really work I think. Havent tested yet.

load more comments (1 replies)

load more comments (6 replies)

[-] mintiefresh@lemmy.world 19 points 2 years ago

Wow. So much smoother today.

Great work.

You dropped this 👑

load more comments (2 replies)

[-] solidgrue@lemmy.world 19 points 2 years ago

Gadzooks! These are huge fixes. Compliments to the team, you guys pulled off a small miracle today.

[-] shotgun_crab@lemmy.world 18 points 2 years ago

You guys are absolute legends, thanks for the update!

[-] lwuy9v5@lemmy.world 18 points 2 years ago

That's so awesome! Look at that GRAPH!

I'd volunteer to be a technical troubleshooter - very familiar with docker/javascript/SQL, not super familiar with rust - but I'm sure yall also have an abundance of nerds to lend a hand.

[-] pleasemakesense@lemmy.world 8 points 2 years ago

You should try to contact one of the admins of this server (Ruud is very busy tho, lots of mentions) and see if you could be of any help. I am sure they would appreciate even just the offer 😄

[-] GnothiSeauton@lemmy.world 18 points 2 years ago

This is why having a big popular instance isn't all bad. It helps detect and fix the scaling problems and inefficiencies for all the other 1000s of instances out there!

[-] AlmightySnoo@lemmy.world 8 points 2 years ago

This, if everyone kept just spreading out to smaller instances as suggested in the beginning, while still a sensible thing to do, no one would have noticed these performance issues. We need to think a few years out, assuming Lemmy succeeds and Reddit dies, and expect that "small instance" will mean 50k users.

load more comments (1 replies)

load more comments (3 replies)

[-] MR_GABARISE@lemmy.world 17 points 2 years ago

This is better optimization than most enterprise devs will see in their lifetimes.

[-] Zzombiee2361@lemmy.world 12 points 2 years ago

Some company would rather throw more hardware at the problem and make the devs work on another useless feature no one use

load more comments (1 replies)

[-] clutchmatic@lemmy.world 11 points 2 years ago

Some managers of the devs are not that interested in significant optimizations... Depends on what incentives and company culture drives them

[-] CIA_chatbot@lemmy.world 16 points 2 years ago* (last edited 2 years ago)

It blows my mind with the amount of traffic you guys must be getting that you are only running one container and not running in a k8s cluster with multiple pods (or similar container orchestration system)

Edit: misread that a second was coming up, but still crazy that this doesn’t take some multi node cluster with multiple pods. Fucking awesome

load more comments (4 replies)

[-] DelvianSeek@lemmy.world 16 points 2 years ago

You guys are absolutely amazing. So many thanks to you @Ruud and the entire admin/troubleshooting team! Thank you.

[-] dreadedsemi@lemmy.world 14 points 2 years ago

My upvote can go through fast now

Good work

[-] MiddleWeigh@lemmy.world 11 points 2 years ago* (last edited 2 years ago)

I took a SM break for a few days, and it's running noticeably better today...I think. (:

Thanks a bunch for floating us degenerates.

[-] WolfhoundRO@lemmy.world 11 points 2 years ago

Really great job, guys! I know from my experience in SRE that these types of debugs, monitoring and fixes can be much pain, so you have all my appreciation. I'm even determined to donate on Patreon if it's available

[-] ruud@lemmy.world 8 points 2 years ago

Donation links are on the frontpage. Thanks!

[-] Datzevo@lemmy.world 10 points 2 years ago

You know there's something about dealing with the lagginess in the past few days makes me appreciate the fast and responsive of the update. It nice to see the community grows and makes the experience at Lemmy feels authentic.

[-] _Rho_@lemmy.world 10 points 2 years ago* (last edited 2 years ago)

As a data engineer, I'd be interested in hearing more about the SQL troubleshooting.

EDIT: It looks like !lemmyperformance@lemmy.ml is a good place to subscribe to for more technical info on some of these performance improvements.

Also the Lemmy GitHub of course contains more information on bugs/enhancements/etc.

load more comments (5 replies)

[-] anal_enjoyer@reddthat.com 9 points 2 years ago

Love these updates! Love the transparency!

[-] ef9357@lemmy.one 9 points 2 years ago

Love the transparency. Thanks to the entire team!

[-] IanM32@lemmy.world 9 points 2 years ago

A lot of this stuff is pretty opaque to me, but I still read through it because I love how detailed they are in sharing what's going on under the hood and how it relates to problems users experience. Kudos to you guys!

[-] KonQuesting@lemmy.sdf.org 8 points 2 years ago

Thanks for the updates! Seeing the details of how you work through these early issues is valuable to those of us thinking of starting an instance.

[-] slashzero@hakbox.social 8 points 2 years ago* (last edited 2 years ago)

As a Performance Engineer myself, these are the kind of performance improvements I like to see. Those graphs look wonderful. Nice job to all.

[-] DharkStare@lemmy.world 8 points 2 years ago

Great job. Everything seems to be working smoothly for me now. The past several days have been a bit rough but now it's all working.

[-] Hynek29@lemmy.world 8 points 2 years ago

Great job guys! It really feels more responsive today

[-] nostalgicgamerz@lemmy.world 7 points 2 years ago* (last edited 2 years ago)

Can we have an update on the status of Lemmy.world and how close ties we are going to have with Meta's threads? Threads is going to support ActivityPub, but time has shown that this is an attempt to try to kill this open platform and eventually replace it with theirs once they get everyone in their ecosystem. (Embrace, Extend...extinguish) Mastodon has said today that they don't mind sleeping with vipers when their demise / dissolution is in Meta's best interest.

Please tell me we are defederating from Meta....or let us know what to expect

EDIT: I originally stated that Mastodon told them to fuck off, but I got confused with Fosstodon (who did that). Mastodon doesn't mind being in bed with Meta