142
you are viewing a single comment's thread
view the rest of the comments
[-] yogthos@lemmy.ml 7 points 7 hours ago

You should be able to get very decent performance with 128gb vram running Qwen 3.6 with something like https://github.com/itigges22/ATLAS especially if you run MTP https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF

A friend of mine gets something like 50 tokens a second with it, and output quality is quite decent.

[-] OwOarchist@pawb.social 7 points 7 hours ago

Yeah, but depending on your location (and usage), you might be burning more than $5/mo in electricity to run that shit. Not to mention the costs of buying all that hardware ... especially at current inflated rates.

If you have to buy 128GB of RAM in 2026, it's going to be a long time before you come out ahead vs paying $20/mo for some AI subscription.

[-] yogthos@lemmy.ml 4 points 7 hours ago

Yeah that's true, depending on the electricity costs, you could be better on a subscription. Especially with DeepSeek, which is incredibly cheap now.

[-] dragnucs@lemmy.ml 1 points 5 hours ago

How does it compare to largest deepseek ans Claude opus 4.6? I hot used to blazing fast speed and accurate results. I'm not buying a server and 128 GB of RAM just to run a model similar to gpt-4.

[-] yogthos@lemmy.ml 3 points 4 hours ago

ATLAS has some benchmarks in the repo, and it's comparable to opus 4.6, you don't actually even need 128gb model for that. An 8 bit quantized model will run with around 32gb and still perform quite well.

this post was submitted on 25 May 2026
142 points (96.7% liked)

Programmer Humor

42256 readers
184 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

founded 6 years ago
MODERATORS