I just found https://www.arliai.com/ who offer LLM inference for quite cheap. Without rate-limits and unlimited token generation. No-logging policy and they have an OpenAI compatible API.
I've been using runpod.io previously but that's a whole different service as they sell compute and the customers have to build their own Docker images and run them in their cloud, by the hour/second.
Should I switch to ArliAI? Does anyone have some experience with them? Or can recommend another nice inference service? I still refuse to pay $1.000 for a GPU and then also pay for electricity when I can use some $5/month cloud service and it'd last me 16 years before I reach the price of buying a decent GPU...
Edit: Saw their $5 tier only includes models up to 12B parameters, so I'm not sure anymore. For larger models I'd need to pay close to what other inference services cost.
Edit2: I discarded the idea. 7B parameter models and one 12B one is a bit small to pay for. I can do that at home thanks to llama.cpp