36
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 22 Feb 2026
36 points (68.8% liked)
Asklemmy
53252 readers
168 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 6 years ago
MODERATORS
Even for people who generally like the function of AI (which seem to be fairly rare here) the absolutely obscene climate impact and implications for peopes jobs and livelihoods, privacy breaches, and general internet enshittification is surely reason enough to be against it.
That I why I like small, specialized, locally hosted AI. Runs acceptably fast and quite on my gaming PC, it's private, and I can give it knowledge is small doses in specific topics and projects.
Which model do you use and what are your specs? I ran a couple using an RTX5060 with 16gb and it's too slow to be usable for larger models while the smaller ones are mostly useless.
I also have a 5060 (ti) with 16GB of RAM. I tend to use GPT-OSS:20B or Qwen3:14B with a context of ~30k. I have custom system prompt for my style of reponse I like on open web ui. That takes up about 14GB of my 16GB VRAM
But yeah it is slower and not as "smart" as the cloud based models, but I think the inconvenience of the speed and having to fact check/test code is worth the privacy and environmental trade offs
Ive had good success on similar hardware (5070 + more ram) with GLM-4.7-Flash, using llama.cpp's
--cpu-moeflag - I can get up to 150k context with it at 20ish tok/sec. I've found it to be a lot better for agentic use than GPT-OSS as well, it seems to do a much more in depth reasoning effort, so while it spends more tokens it seems worth it for the end result.