Every situation is different, but I've solved similar problems in the past with a work queue. So things like writes to a database are queued in a central worker task that has more knowledge about lock orders and can do bulk operations to reduce the time locks are needed, resulting in greater throughput at the risk of data loss if there's a crash. We did this mostly for write performance on an embedded product (we used SD cards with bursty writes).
So applying this to the more general problem of mutexed, I wonder if having a single thread manage locks would work. I'm thinking that single thread would store all locks as a bit field, and threads would interact with it like this:
- Thread calls
Lock(locks_needed)
Lock
creates a future that unlocks when all of the requested resources are available, with two timeouts: wait and error- The future checks the bit field if all locks are available; if the wait timeout is reached, the future is moved to a priority list, which pauses all other futures from locking the resource until this future resolves
- If the error timeout is reached, the future resolves with that error
This has quite a bit more overhead (perhaps it could be largely eliminated in the happy path case with atomic operations), but it should resolve the livelock issue since it degrades to a priority queue. It could work even better in an async context since you wouldn't need a separate async loop (have an AsyncLock entry point as well).