11
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 10 Jan 2025
11 points (92.3% liked)
LocalLLaMA
2410 readers
27 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 2 years ago
MODERATORS
I'd say you're looking for something like a 80GB VRAM GPU. That'd be industry grade (an Nvidia A100 for example).
And to squeeze it into 80GB the model would need to be quantized to 4 or 5 bits. There are some LLM VRAM calculators available where you can put in your numbers, like this one.
Another option would be to rent these things by the hour in some datacenter (at about $2 to $3 per hour). Or do inference on a CPU with a wide memory interface. Like an Apple M3 processor or an AMD Epyc. But these are pricey, too. And you'd need to buy them alongside an equal amount of (fast) RAM.