1022

It's fine until you run out of disk space (lemmy.world)

submitted 2 years ago by hypertown@lemmy.world to c/programmerhumor@lemmy.ml

132 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] mvirts@lemmy.world 48 points 2 years ago

Exactly how I plan to deploy LLMs on my desktop 😹

[-] dan@upvote.au 14 points 2 years ago

You should be able to fit a model like LLaMa2 in 64GB RAM, but output will be pretty slow if it's CPU-only. GPUs are a lot faster but you'd need at least 48GB of VRAM, for example two 3090s.

[-] PolarisFx@lemmy.dbzer0.com 6 points 2 years ago* (last edited 2 years ago)

Amazon had some promotion in the summer and they had a cheap 3060 so I grabbed that and for Stable Diffusion it was more than enough, so I thought oh... I'll try out llama as well. After 2 days of dicking around, trying to load a whack of models, I spent a couple bucks and spooled up a runpod instance. It was more affordable then I thought, definitely cheaper than buying another video card.

[-] dan@upvote.au 4 points 2 years ago

As far as I know, Stable Diffusion is a far smaller model than Llama. The fact that a model as large as LLaMa can even run on consumer hardware is a big achievement.

[-] PolarisFx@lemmy.dbzer0.com 2 points 2 years ago* (last edited 2 years ago)

I had couple 13B models loaded in, it was ok. But I really wanted a 30B so I got a runpod. I'm using it for api, I did spot pricing and it's like $0.70/hour

I didn't know what to do with it at first, but when I found Simply Tavern I kinda got hooked.

[-] barsoap@lemm.ee 2 points 2 years ago* (last edited 2 years ago)

Both SD 1.5 and SDXL run on 4g cards, you really want fp16 though.

In principle it should be possible to get decentish performance out of e.g. an RX480 by using the (forced) 32-bit precision to do bigger winograd convolutions (severely reducing the number of fmas needed) but don't expect AMD to write kernels for that, ROCm is barely working on mid range cards in the first place.

Meanwhile, I actually ended up doubling my swap because 16G RAM are kinda borderline to merge SDXL models. OOM might kick in, it might not, and in any case your system is going to lock without earlyoom.

[-] mvirts@lemmy.world 2 points 2 years ago

*laughs in top of the line 2012 hardware 😭

[-] j4k3@lemmy.world 3 points 2 years ago

I need it just for the initial load on transformers based models to then run them in 8 bit. It is ideal for that situation

[-] mvirts@lemmy.world 2 points 2 years ago

That does make a lot of sense

[-] UFODivebomb@programming.dev 2 points 2 years ago

Same. I'm patient

this post was submitted on 02 Oct 2023

1022 points (99.1% liked)

Programmer Humor

42398 readers

3 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

Posts must be relevant to programming, programmers, or computer science.
No NSFW content.
Jokes must be in good taste. No hate speech, bigotry, etc.

founded 6 years ago

MODERATORS

AgreeableLandscape@lemmy.ml

cat_programmer@lemmy.ml