What is a self-hosted small LLM actually good for (<= 3B) (lemmy.world)

submitted 1 month ago* (last edited 1 month ago) by catty@lemmy.world to c/selfhosted@lemmy.world

51 comments fedilink hide all child comments

I've tried coding and every one I've tried fails unless really, really basic small functions like what you learn as a newbie compared to say 4o mini that can spit out more sensible stuff that works.

I've tried explanations and they just regurgitate sentences that can be irrelevant, wrong, or get stuck in a loop.

So. what can I actually use a small LLM for? Which ones? I ask because I have an old laptop and the GPU can't really handle anything above 4B in a timely manner. 8B is about 1 t/s!

top 50 comments

sorted by: hot top controversial new old

[-] HelloRoot@lemy.lol 10 points 1 month ago* (last edited 1 month ago)

Sorry, I am just gonne dump you some links from my bookmarks that were related and interesting to read, cause I am traveling and have to get up in a minute, but I've been interested in this topic for a while. All of the links discuss at least some usecases. For some reason microsoft is really into tiny models and made big breakthroughs there.

https://reddit.com/r/LocalLLaMA/comments/1cdrw7p/what_are_the_potential_uses_of_small_less_than_3b/

https://github.com/microsoft/BitNet

https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090

[-] some_guy@lemmy.sdf.org 8 points 1 month ago

I installed Llama. I've not found any use for it. I mean, I've asked it for a recipe because recipe websites suck, but that's about it.

[-] GreenKnight23@lemmy.world 19 points 1 month ago

you can do a lot with it.

I heated my office with it this past winter.

[-] iii@mander.xyz 7 points 1 month ago

Converting free text to standardized forms such as json

[-] MadMadBunny@lemmy.ca 0 points 1 month ago

Oh—do you happen to have any recommendations for that?

[-] iii@mander.xyz 8 points 1 month ago

DeepSeek-R1-Distill-Qwen-1.5B

[-] Mordikan@kbin.earth 6 points 1 month ago

I've used smollm2:135m for projects in DBeaver building larger queries. The box it runs on is Intel HD 530 graphics with an old i5-6500T processor. Doesn't seem to really stress the CPU.

UPDATE: I apologize to the downvoter for not masochistically wanting to build a 1000 line bulk insert statement by hand.

[-] MTK@lemmy.world 6 points 1 month ago

Have you tried RAG? I believe that they are actually pretty good for searching and compiling content from RAG.

So in theory you could have it connect to all of you local documents and use it for quick questions. Or maybe connected to your signal/whatsapp/sms chat history to ask questions about past conversations

[-] catty@lemmy.world 1 points 1 month ago

No, what is it? How do I try it?

[-] MTK@lemmy.world 5 points 1 month ago

RAG is basically like telling an LLM "look here for more info before you answer" so it can check out local documents to give an answer that is more relevant to you.

You just search "open web ui rag" and find plenty kf explanations and tutorials

[-] iii@mander.xyz 2 points 1 month ago* (last edited 1 month ago)

I think RAG will be surpassed by LLMs in a loop with tool calling (aka agents), with search being one of the tools.

[-] interdimensionalmeme@lemmy.ml 3 points 1 month ago

LLMs that train LoRas on the fly then query themselves with the LoRa applied

[-] CrayonDevourer@lemmy.world 5 points 1 month ago* (last edited 1 month ago)

Currently I've been using a local AI (a couple different kinds) to first - take the audio from a Twitch stream; so that I have context about the conversation, convert it to text, and then use a second AI; an LLM fed the first AIs translation + twitch chat and store 'facts' about specific users so that they can be referenced quickly for a streamer who has ADHD in order to be more personable.

That way, the guy can ask User X how their mothers surgery went. Or he can remember that User K has a birthday coming up. Or remember that User G's son just got a PS5 for Christmas, and wants a specific game.

It allows him to be more personable because he has issues remembering details about his users. It's still kind of a big alpha test at the moment, because we don't know the best way to display the 'data', but it functions as an aid.

[-] shnizmuffin@lemmy.inbutts.lol 2 points 1 month ago

Hey, you're treating that data with the respect it demands, right? And you definitely collected consent from those chat participants before you Hoover'd up their [re-reads example] extremely Personal Identification Information AND Personal Health Information, right? Because if you didn't, you're in violation of a bunch of laws and the Twitch TOS.

[-] CrayonDevourer@lemmy.world 3 points 1 month ago* (last edited 1 month ago)

If I say my name is Doo doo head, in a public park, and someone happens to overhear it - they can do with that information whatever they want. Same thing. If you wanna spew your personal life on Twitch, there are bots that listen to all of the channels everywhere on twitch. They aren't violating any laws, or Twitch TOS. So, *buzzer* WRONG.

Right now, the same thing is being done to you on Lemmy. And Reddit. And Facebook. And everywhere else.

Look at a bot called "FrostyTools" for Twitch. Reads Twitch chat, Uses an AI to provide summaries of chat every 30 minutes or so. If that's not violating TOS, then neither am I. And thousands upon thousands of people use FrostyTools.

I have the consent of the streamer, I have the consent of Twitch (through their developer API), and upon using Twitch, you give the right to them to collect, distribute, and use that data at their whim.

[-] aksdb@lemmy.world 5 points 1 month ago

So, buzzer WRONG.

Quite arrogant after you just constructed a faulty comparison.

If I say my name is Doo doo head, in a public park, and someone happens to overhear it - they can do with that information whatever they want. Same thing.

That's absolutely not the same thing. Overhearing something that is in the background is fundamentally different from actively recording everything going on in a public space. You film yourself or some performance in a park and someone happens to be in the background? No problem. You build a system to identify everyone in the park and collect recordings of their conversations? Absolutely a problem, depending on the jurisdiction. The intent of the recording(s) and the reasonable expectations of the people recorded are factored in in many jurisdictions, and being in public doesn't automatically entail consent to being recorded.

See for example https://www.freedomforum.org/recording-in-public/

(And just to clarify: I am not arguing against your explanation of Twitch's TOS, only against the bad comparison you brought.)

[-] kattfisk@lemmy.dbzer0.com 2 points 1 month ago

You're both getting side-tracked by this discussion of recording. The recording is likely legal in most places.

It's the processing of that unstructured data to extract and store personal information that is problematic. At that point you go from simply recording a conversation of which you are a part, to processing and storing people's personal data without their knowledge, consent, or expectation.

load more comments (1 replies)

[-] CrayonDevourer@lemmy.world 0 points 1 month ago* (last edited 1 month ago)

You build a system to identify everyone in the park and collect recordings of their conversations? Absolutely a problem, depending on the jurisdiction.

Literally not. The police use this right now to record your location and time seen using license plates all over the nation - with private corporations providing the service.

and being in public doesn't automatically entail consent to being recorded.

And yes, it's called 'expectation to the right of privacy'. Public venues are not 'private' locations, and thus do not need consent. You can, quite literally, record anyone in public.

Even the link you provided agrees.

[-] tfm@europe.pub 1 points 1 month ago

In the US maybe but not in Germany, Austria and probably most countries in Europe.

[-] catty@lemmy.world 1 points 1 month ago

Doesn't Twitch own all data that is written and their TOS will state something like you can't store data yourself locally.

[-] CrayonDevourer@lemmy.world 2 points 1 month ago* (last edited 1 month ago)

I'm not storing their data. I'm feeding it to an LLM which infers things and storing that data. Other Twitch bots store twitch data too. Everything from birthdays to imaginary internet points.

[-] catty@lemmy.world -1 points 1 month ago

lol. Way to contradict yourself.

[-] catty@lemmy.world -1 points 1 month ago

Was this system vibe coded? I get the feeling it was...

[-] CrayonDevourer@lemmy.world 1 points 1 month ago* (last edited 1 month ago)

There's not actually that much code. It's like 8 lines for an AI 'agent', and maybe another 16 lines for 'tools', and I'm using Streamlink for grabbing the audio stream, and pulseaudio has a 'monitor' device you can use to listen to what's playing on the speakers. Throw it on a very minimal linux distro on a VM, and that's it.

I don't do 'vibe coding', but that IS where I got the idea from. People who are doing 'vibe coding' nowadays aren't just plugging things into a generic AI, they're spinning up 'agents' and making tools via MCP and then those agents are tasked with specific things, and use the tools to directly write to files, search the internet, read documents, etc

[-] carl_dungeon@lemmy.world 1 points 1 month ago

Most US states are single party consent. https://recordinglaw.com/united-states-recording-laws/one-party-consent-states/

[-] interdimensionalmeme@lemmy.ml 2 points 1 month ago

There is no expectation of privacy in public spaces. Participants to these streams which are open to all do not have a prohibition on repeating what they have heard.

[-] carl_dungeon@lemmy.world 1 points 1 month ago

Right and what I was saying was even if it wasnt “public”, single party consent means the person recording can be that single party- so still a non-issue.

[-] kattfisk@lemmy.dbzer0.com 0 points 1 month ago

Repeating what they heard is very different from automatically processing the chat to harvest personal information about the participants.

Just because some data is publicly available doesn't mean all processing of that data is legal and moral.

[-] interdimensionalmeme@lemmy.ml -1 points 1 month ago

It is qualitatively equivalent. Any single piece of information could have been copied, it is safe to assume it has all been copied.

Although I would be onboard for supporting an expectation of pruvacy in public spaces and making private cctv recording illegal.

[-] Hadowenkiroast@piefed.social 0 points 1 month ago

sounds like salesforce for a twitch setting. cool use case, must make fun moments when he mentions such things.

[-] jlow@discuss.tchncs.de 1 points 1 month ago

Esp. if the LLM just hallucinates 50% of the "facts" a about the users 👌

[-] CrayonDevourer@lemmy.world 3 points 1 month ago* (last edited 1 month ago)

That hasn't been a problem at all for the 200+ users it's tracking so far for about 4 months.

I don't know a human that could ever keep up with this kind of thing. People just think he's super personable, but in reality he's not. He's just got a really cool tool to use.

He's managed some really good numbers because being that personal with people brings them back and keeps them chatting. He'll be pushing for partner after streaming for only a year and he's just some guy I found playing Wild Hearts with 0 viewers one day... :P

[-] catty@lemmy.world 0 points 1 month ago

Surely none of that uses a small LLM <= 3B?

[-] CrayonDevourer@lemmy.world 2 points 1 month ago* (last edited 1 month ago)

Yes. The small LLM isn't retrieving data, it's just understanding context of text enough to know what "Facts" need to be written to a file. I'm using the publicly released Deepseek models from a couple of months ago.

[-] ikidd@lemmy.world 4 points 1 month ago

It'll work for quick bash scripts and one-off things like that. But there's not usually enough context window unless you're using a 24G GPU or such.

[-] smayonak@lemmy.world 1 points 1 month ago

Snippets are a great use.

I use StableCode on my phone as a programming tutor for learning Python. It is outstanding in both speed and in accuracy for this task. I have it generate definitions which I copy and paste into Anki the flashcard app. Whenever I'm on a bus or airplane I just start studying. Wish that it could also quiz me interactively.

[-] entwine413@lemm.ee 4 points 1 month ago* (last edited 1 month ago)

I've integrated mine into Home Assistant, which makes it easier to use their voice commands.

I haven't done a ton with it yet besides set it up, though, since I'm still getting proxmox configured on my gaming rig.

[-] Passerby6497@lemmy.world 1 points 1 month ago

What are you using for voice integration? I really don't want to buy and assemble their solution if I don't have to

[-] entwine413@lemm.ee 1 points 1 month ago

I just use the companion app for now. But I am designing a HAL9000 system for my home.

[-] RickyRigatoni@retrolemmy.com 3 points 1 month ago

I have it roleplay scenarios with me and sometimes I verbally abuse it for fun.

[-] ragingHungryPanda@lemmy.zip 3 points 1 month ago

I've run a few models that I could on my GPU. I don't think the smaller models are really good enough. They can do stuff, sure, but to get anything out of it, I think you need the larger models.

They can be used for basic things, though. There are coder specific models you can look at. Deepseek and qwen coder are some popular ones

[-] scottrepreneur@lemmy.world 1 points 1 month ago

Been coming to similar conclusions with some local adventures. It's decent but not as able to process larger contexts.

[-] swelter_spark@reddthat.com 3 points 1 month ago

7b is the smallest I've found useful. I'd try a smaller quant before going lower, if I had super small vram.

[-] surph_ninja@lemmy.world 1 points 1 month ago

Learning/practice, and any use that feeds in sensitive data you want to keep on-prem.

Unless you’re set to retire within the next 5 years, the best reason is to keep your resume up to date with some hands-on experience. With the way they’re trying to shove AI into every possible application, there will be few (if any) industries untouched. If you don’t start now, you’re going to be playing catch up in a few years.

[-] irmadlad@lemmy.world 1 points 1 month ago

As cool and neato as I find AI to be, I haven't really found a good use case for it in the selfhosting/homelabbing arena. Most of my equipment is ancient and lacking the GPU necessary to drive that bus.

[-] herseycokguzelolacak@lemmy.ml 1 points 1 month ago

for coding tasks you need web search and RAG. It's not the size of the model that matters, since even the largest models find solutions online.

[-] catty@lemmy.world 1 points 1 month ago

Any suggestions for solutions?

[-] herseycokguzelolacak@lemmy.ml 1 points 1 month ago

Not on top of my head, but there must be something. llama.cpp and vllm have basically solved the inference problem for LLMs. What you need is a RAG solution on top that also combines it with web search.

[-] 30p87@feddit.org 0 points 1 month ago

Nothing.

[-] imsufferableninja@sh.itjust.works 0 points 1 month ago

absolutely nothing

load more comments

this post was submitted on 16 Jun 2025

35 points (94.9% liked)

Selfhosted

50057 readers

75 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz