Here, you can dig into what posted days before the pull request you read:
https://github.com/LemmyNet/lemmy/issues/2877#issuecomment-1685314733
June 4:
joins are better than
in
queries with potentially thousands of inserted IDs.
Given that more than 8 JOIN statements is something PostgreSQL specifically concerns itself with (join_collapse_limit). I hand-edit the query with a single IN clause and the performance problem disappears. 8 full seconds becomes less than 200ms against 5,431,043 posts. And that 200ms is still high, as I was extremely over-reaching with "LIMIT 1000" in case the end-user went wild with blocking lists or some other filtering before reaching the final "LIMIT 10". When I change it to "LIMIT 20" in the subquery, it drops almost in half to 115ms... still meeting the needs of the outer "LIMIT 10" by double. More of the core query filtering can be put into the IN subquery, as we aren't dealing with more than 500 length pages (currently limited to 50).
SELECT
"post"."id" AS post_id, "post"."name" AS post_title,
-- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
-- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
"person"."id" AS p_id, "person"."name",
-- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
-- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
-- "person"."bot_account", "person"."ban_expires",
"person"."instance_id" AS p_inst,
"community"."id" AS c_id, "community"."name" AS community_name,
-- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
-- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
-- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
"community"."instance_id" AS c_inst,
-- "community"."moderators_url", "community"."featured_url",
("community_person_ban"."id" IS NOT NULL) AS ban,
-- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
-- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
--"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
-- "community_follower"."pending",
("post_saved"."id" IS NOT NULL) AS save,
("post_read"."id" IS NOT NULL) AS read,
("person_block"."id" IS NOT NULL) as block,
"post_like"."score",
coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread
FROM (
((((((((((
(
(
"post_aggregates"
INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
)
INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
)
LEFT OUTER JOIN "community_person_ban"
ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
)
INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
)
LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
)
LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
)
LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
)
LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
)
LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
)
LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
)
LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
)
LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
)
WHERE
post_aggregates.id IN (
SELECT id FROM post_aggregates
WHERE "post_aggregates"."creator_id" = 3
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 1000
)
AND
(((((((
(
(("community"."deleted" = false) AND ("post"."deleted" = false))
AND ("community"."removed" = false))
AND ("post"."removed" = false)
)
AND ("post_aggregates"."creator_id" = 3)
)
AND ("post"."nsfw" = false))
AND ("community"."nsfw" = false)
)
AND ("local_user_language"."language_id" IS NOT NULL)
)
AND ("community_block"."person_id" IS NULL)
)
AND ("person_block"."person_id" IS NULL)
)
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;
If it isn't social hazing, then what is going on here? Why has this issue gone on since May and servers are crashing every day?
who would have predicted that Elon Musk would do all the wild things he did with Twitter. Reddit pissing everyone off in June... pretty odd how audiences are behaving in 2023 towards all this. Oh yha, Threads, that coming on the scene too. 2023 has really been odd for audiences.
The SQL speaks for itself, but I don't know what's going on in terms of why people are treating social media platforms like Lemmy, Twitter, Threads, Reddit this year so unusually. This SQL statement kind of thing has been covered in so many books, conferences, etc. It's like forgotten history now in the era of Elon Musk X and Reddit Apollo times.
I don't know what to say other than I can try to hire a translator or teacher to explain how this SQL problem is obvious and well understood 13 years ago. I mean, there was a whole "NoSQL movement" because of this kind of thing. But I clearly can't get people to hear past all the Elon Musk, Threads, Lemmy from Reddit ... and I'm left describing it as 'social hazing' or whatever is gong on with social media.
Lemmy has like 5 different Rust programming communities, but nobody fixing Lemmy. It's surreal in 2023 the Elon Musk X days. I think it's making all of us uncomfortable. The social movement underway.