142
The circle of life
(lemmy.ml)
Post funny things about programming here! (Or just rant about your favourite programming language.)
You should be able to get very decent performance with 128gb vram running Qwen 3.6 with something like https://github.com/itigges22/ATLAS especially if you run MTP https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF
A friend of mine gets something like 50 tokens a second with it, and output quality is quite decent.
Yeah, but depending on your location (and usage), you might be burning more than $5/mo in electricity to run that shit. Not to mention the costs of buying all that hardware ... especially at current inflated rates.
If you have to buy 128GB of RAM in 2026, it's going to be a long time before you come out ahead vs paying $20/mo for some AI subscription.
Yeah that's true, depending on the electricity costs, you could be better on a subscription. Especially with DeepSeek, which is incredibly cheap now.
How does it compare to largest deepseek ans Claude opus 4.6? I hot used to blazing fast speed and accurate results. I'm not buying a server and 128 GB of RAM just to run a model similar to gpt-4.
ATLAS has some benchmarks in the repo, and it's comparable to opus 4.6, you don't actually even need 128gb model for that. An 8 bit quantized model will run with around 32gb and still perform quite well.