What is a good model that runs on 6GB Vram? (discuss.online)

submitted 6 days ago by OmegaLemmy@discuss.online to c/localllama@sh.itjust.works

10 comments fedilink hide all child comments

Should be good at conversations and creative, it'll be for worldbuilding

Best if uncensored as I prefer that over it kicking in when I least want it

I'm fine with those roleplaying models as long as they can actually give me ideas and talk to be logically

you are viewing a single comment's thread
view the rest of the comments

[-] a2part2@lemmy.zip 1 points 6 days ago

Can't you just increase context length at the cost of paging and slowdown?

[-] Smokeydope@lemmy.world 2 points 6 days ago

At some point you'll run out of vram memory on the GPU. You make it slower by offloading some memory layers to make room for more context.

[-] a2part2@lemmy.zip 1 points 6 days ago

Yes, but if he's world building, a larger, slower model might just be an acceptable compromise.

I was getting oom errors doing speech to text on my 4070ti. I know (now) that I should have for for the 3090ti. Such is life.

[-] Pyro@programming.dev 1 points 6 days ago

At a certain point, layers will be pushed to RAM leading to incredibly slow inference. You don't want to wait hours for the model to generate a single response.

this post was submitted on 31 Jan 2025

15 points (89.5% liked)

LocalLLaMA

2530 readers

5 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works