I just use Rust for this. You can make the binaries fairly small if you put a bit of effort in. Plus it's not a niche language, and you get the benefit of a huge community. And your code is pretty much fast by default.
The only real downside is the compilation time, which is a lot better than it used to be but still isn't great.
Unlikely, you'd do packet processing in hardware, either through some kind of peripheral or if you're using RISC-V you could add custom instructions.