What is a good model that runs on 6GB Vram? (discuss.online)

submitted 6 days ago by OmegaLemmy@discuss.online to c/localllama@sh.itjust.works

10 comments fedilink hide all child comments

Should be good at conversations and creative, it'll be for worldbuilding

Best if uncensored as I prefer that over it kicking in when I least want it

I'm fine with those roleplaying models as long as they can actually give me ideas and talk to be logically

top 10 comments

sorted by: hot top controversial new old

[-] Smokeydope@lemmy.world 6 points 6 days ago* (last edited 6 days ago)

Try the IQ4_XS quant of mistral nemo

If you want a more roleplay based model with more creativity at the cost of other things you can try the arliai finetune of nemo.

If you want the model to remember long term you need to bump its context size up. You can trade GPU layers for context size or go down a quant or go to a smaller model like llama 8b.

[-] a2part2@lemmy.zip 1 points 6 days ago

Can't you just increase context length at the cost of paging and slowdown?

[-] Smokeydope@lemmy.world 2 points 6 days ago

At some point you'll run out of vram memory on the GPU. You make it slower by offloading some memory layers to make room for more context.

[-] a2part2@lemmy.zip 1 points 6 days ago

Yes, but if he's world building, a larger, slower model might just be an acceptable compromise.

I was getting oom errors doing speech to text on my 4070ti. I know (now) that I should have for for the 3090ti. Such is life.

[-] Pyro@programming.dev 1 points 6 days ago

At a certain point, layers will be pushed to RAM leading to incredibly slow inference. You don't want to wait hours for the model to generate a single response.

[-] icecreamtaco@lemmy.world 2 points 5 days ago* (last edited 5 days ago)

7B models have been very good for about a year now. Lumimaid is my current favorite but I have to quantize it one step too far since it's an 8B. Noromaid was the best one before that

[-] hendrik@palaver.p3x.de 2 points 6 days ago* (last edited 6 days ago)

Uh, that's not much VRAM. What kind of model sizes fit into a GPU like that? Does a 7B parameter model fit, quantized to 4bit? With whatever context length you need?

[-] OmegaLemmy@discuss.online 2 points 6 days ago

Yeah, Llama 3.1 7b works although it's with a bit of ram

It's not as slow as one might expect

[-] hendrik@palaver.p3x.de 1 points 6 days ago* (last edited 6 days ago)

Uh I forgot Llama3 has 8B parameters. What about something like L3-8B-Lunaris? Though, that's not the latest and greatest anymore and it's tuned for roleplay. Maybe it's worth a try, but there are probably better ones out there. I use Mistral-Nemo-Instruct-2407 for pretty much everything. I think it's a great allrounder and can do anything from answering questions about facts to dialogue to storywriting, and it's not censored at all. But it has 14B parameters unless I'm mistaken... Does your worldbuilding have to be fast? Because if you're fine with it being very slow, you can just run it on the CPU, without any graphics card. I usually do that. It'll take a few minutes to ingest the prompt and come up with an output. But I don't really care for use cases like storywriting or creative worldbuilding. (Software would be something like llama.cpp, ollama, LocalAI, koboldcpp, ...)

Otherwise I think you'd need to find a fine-tune of a <=8B parameter model that fits. There are enough of them out there. But I found writing prose, or story arcs is a bit more challenging than other tasks, and I believe worldbuilding might be, too. So I guess it's not as easy as finding a random roleplay or chatbot model.

[-] OmegaLemmy@discuss.online 2 points 6 days ago

I'll see, maybe I could work with 14b after all... Maybe...

this post was submitted on 31 Jan 2025

15 points (89.5% liked)

LocalLLaMA

2530 readers

9 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works