History / sync is known as message archive management (MAM) & every normal modern client & server supports it. OMEMO uses same double-ratchet encryption & multiple clients as Matrix (with the same old client key dropping issues sadly). By default it does not support groups you are correct, however, FOSS Jitsi (& Zoom for that matter) is powered by XMPP under the hood & can be stood up by yourself.
Personally three of my circles have opted for separate Mumble servers for voice coms (I run one of them from my living room) as video is only ever rarely needed & the system resources is minimal. Having web cams on is seen as a chore & distraction sometimes. The only time video is helpful in my experience is screen share which is different—but screensharing is the worst tool for trying to do code pairing / debugging a terminal using upterm provides a crisper view experience, lower data/system requirements, & observers can optionally drive the remote session.
Snikket is meant to be super simple to self-host. Ejabberd has a web GUI that can make configuration easier.