Super interesting read! Thanks for sharing.
I'd be interested to see how impactful a sparse representation would be. It's an optimization I know to have been useful from trying it on AoC and similar cellular automata problems, but I have no clue how it would mesh with the other optimizations made here. I would guess its effectiveness would also rely heavily on the particular ruleset you were simulating, as well as your starting state.