I'm starting to get more and more HDR content, and I'm noticing an issue with my Jellyfin server. In nearly all cases, it's required to transcode and tone map the HDR content. All of it is in 4k.
My little Quadro P400 just can't keep up. Encoder and decoder usage hovers around 15-17%, but the GPU core usage is pinned at 100% the entire time, and my framerate doesn't exceed 19fps, which makes the video skip so badly it's unwatchable.
What's a reasonable upgrade? I'm thinking about the P4000, but that might be excessive. Also, it needs to fit in a low-profile slot.
Edit: I'm shocked at how much good feedback I received on this post. Hopefully someone else will stumble on it in the future and be able to learn something. Ultimately, I decided to purchase a used RTX A2000 for just about $250. It's massively overkill for transcoding/tone mapping 4k, but once I'm brave enough to risk breaking my Proxmox install and setting up vGPU, I'm hoping to take advantage of the Tensor cores for AI object detection in my Blue Iris VM. Also, the A2000 supports AV1, and while I don't need that at the moment, it will be nice to have in the future, I think.
Final Edit: I replaced the Quadro P400 with an RTX A2000 today. With the P400, transcoding 4k HEVC HDR to 4k HEVC (or h264) SDR with tone mapping resulted in transcode rate of about 19fps with 100% GPU usage. With the A2000, I'm getting a transcode rate of about 120fps with around 30% GPU usage; plenty of room for growth if I add 1 or 2 users to the server. For $250, it was well worth the upgrade.
Unfortunately, my CPU does not support quicksync; I'm using dual E5-2650v2s in the server that hosts Jellyfin. It's been a while since I researched it, but I believe that Haswell was the first architecture with quicksync; my CPUs are Ivy Bridge. I've been wanting to upgrade for a while, but it really comes down to the fact that it runs all of my VMs and containers just fine, and there's always somewhere else I find to spend my money.
Regardless, the Jellyfin docs say that tone mapping is only supported via CUDA, which would mean I couldn't use quicksync anyway.