Running local models is good now (vickiboykis.com)

submitted 1 day ago by yogthos@lemmy.ml to c/technology@lemmy.ml

19 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] pimat@feddit.org 10 points 1 day ago

Stopped reading at 64GB of RAM in a M2 Mac... That's not a real world example to start.

[-] yogthos@lemmy.ml 10 points 1 day ago

You can run the Gemma 4 and Qwen3.5 MoE models with as little as 12 GB of VRAM at 30-40 tps (Q4/Q5), and they both blow GPT-4o and DeepSeek R1 out of the water. But 64gb RAM is also not really out of scale with the cost of a shop tool in other trades. If you're a professional that's confident in a positive return on the investment, or just a hobbyist with the luxury budget for a "shop" that cost is well within consumer market. That's not everybody, of course, but it's not some inconceivable fantasy.

The key point is that local models continue to get more efficient and usable. You need high end consumer grade hardware today, but given how fast improvements are happening, it's entirely likely that you'll be able to get the same capability on even smaller hardware in a few months.

[-] pimat@feddit.org 4 points 1 day ago

I really appreciate you taking the time for the reply. From your point of view this makes sense of course and I hope you are right about the upcoming improvements. I did some experiments with a M1 Mac mini and was quickly disappointed but maybe I'll give it another shot. Thanks again, I'm always open to be corrected and love to learn new stuff.

[-] lichtmetzger@discuss.tchncs.de 2 points 1 day ago* (last edited 1 day ago)

Doesn't have to be a Mac, my GPD Win Max 2 has 64GB as well for a much lower price and it can somehow use 55GB on the integrated NPU (AMD 780M) for running models with ollama. I can even combine that with an external GPU on the Oculink port to increase the total memory.

It takes between 30s to 5min to get a reply, but it does work and it's mainly useful for going over my project asking how to improve the codebase.

Quality-wise it's good enough for boilerplate code and small improvements. Wouldn't trust it to work on big features in larger projects, but I don't trust LLMs in general for that. I don't see a big difference to ChatGPT. And Gemini (which is a win for local hosting!). The usual caveats always apply. All models have their problems and people tend to overhype LLMs in general.

[-] setsubyou@lemmy.world 4 points 1 day ago

Why not. I have a 2020 M1 MBP with 64 GB too. But you don’t need that much for the models in the article.

this post was submitted on 16 Jun 2026

28 points (96.7% liked)

Technology

42750 readers

138 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 7 years ago

MODERATORS

MinutePhrase@lemmy.ml