656
The Rule (lemmy.ml)
submitted 1 month ago by roon@lemmy.ml to c/196@lemmy.blahaj.zone
you are viewing a single comment's thread
view the rest of the comments
[-] Sabata11792@ani.social 6 points 1 month ago* (last edited 1 month ago)

Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of "usable" ram set up as swap.

It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.

[-] AeonFelis@lemmy.world 5 points 1 month ago

Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.

[-] Sabata11792@ani.social 4 points 1 month ago

I would have to use an entire 1tb drive for swap but I'm sure I could manage 1 token before the heat death of the universe.

[-] AeonFelis@lemmy.world 4 points 1 month ago

I'd worry less about the heat death of the universe and more about your hardware's heat from all that load.

this post was submitted on 25 Jul 2024
656 points (100.0% liked)

196

16238 readers
1815 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago
MODERATORS