86
you are viewing a single comment's thread
view the rest of the comments
[-] eager_eagle@lemmy.world 7 points 2 weeks ago

I bet he just wants a card to self host models and not give companies his data, but the amount of vram is indeed ridiculous.

[-] jeena@piefed.jeena.net 4 points 2 weeks ago

Exactly, I'm in the same situation now and the 8GB in those cheaper cards don't even let you run a 13B model. I'm trying to research if I can run a 13B one on a 3060 with 12 GB.

[-] TheHobbyist@lemmy.zip 4 points 2 weeks ago

You can. I'm running a 14B deepseek model on mine. It achieves 28 t/s.

[-] jeena@piefed.jeena.net 1 points 2 weeks ago

Oh nice, that's faster than I imagined.

[-] levzzz@lemmy.world 1 points 2 weeks ago

You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory

[-] Viri4thus@feddit.org 1 points 2 weeks ago

I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i'm having trouble reaching that level of performance. Thx

this post was submitted on 02 Feb 2025
86 points (79.9% liked)

Technology

63023 readers
2468 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS