view the rest of the comments
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Isn't this using a lot of computing power?
Not really, it uses some GPU power when it's actively generating a response, but otherwise it just sits idle.
you hear that said about AI because companies are desperately throwing more and more resources at it to get 0.3% better results, and people are collectively running an insane amount of prompts all the time.
but on a personal level it's not really any different from any other computations, people render videos all the time and no one complains about the resource usage from that, because companies aren't trying to sell bloated video rendering services to gardening businesses.
I've been testing Ollama in Docker/WSL with the idea that if I like it I'll eventually move my GPU into my home server and get an upgrade for my gaming pc. When you run a model it has to load the whole thing into VRAM. I use the 8gb models so it takes 20-40 seconds to load the model and then each response is really fast after that and the GPU hit is pretty small. After I think five minutes by default it will unload the model to free up VRAM.
Basically this means that you either need to wait a bit for the model to warm up or you need to extend that timeout so that it stays warm longer. That means that I cannot really use my GPU for anything else while the LLM is loaded.
I haven't tracked power usage, but besides the VRAM requirements it doesn't seem too intensive on resources, but maybe I just haven't done anything complex enough yet.