11

Do i need industry grade gpu's or can i scrape by getring decent tps with a consumer level gpu.

you are viewing a single comment's thread
view the rest of the comments
[-] GenderNeutralBro@lemmy.sdf.org 4 points 11 hours ago* (last edited 10 hours ago)

If you're running a consumer level GPU, you'll be operating with 24GB of VRAM max (RTX 4090, RTX 3090, or Radeon 7900XTX).

90b model = 90GB at 8-bit quantization (plus some extra based on your context size and general overhead, but as a ballpark estimate, just going by the model size is good enough). You would need to drop down to 2-bit quantization to have any hope to fit it in a single consumer GPU. At that point you'd probably be better off using a smaller model will less aggressive quantization, like a 32b model at 4-bit quantization.

So forget about consumer GPUs for that size of model. Instead, you can look at systems with integrated memory, like a Mac with 96-128GB of memory, or something similar. HP has announced a mini PC that might be good, and Nvidia has announced a dedicated AI box as well. Neither of those are available for purchase yet, though.

You could also consider using multiple consumer GPUs. You might be able to get multiple RTX 3090s for cheaper than a Mac with the same amount of memory. But then you'll be using several times more power to run it, so keep that in mind.

this post was submitted on 10 Jan 2025
11 points (92.3% liked)

LocalLLaMA

2410 readers
27 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS