[-] SuspciousCarrot78@lemmy.world 17 points 18 hours ago

I like to secretly imagine it stands for SIG SAUER. Bang = process ded

[-] SuspciousCarrot78@lemmy.world 1 points 3 days ago* (last edited 3 days ago)

Sorry - I think I misunderstood part of your question (what stage have you actually gotten to). See what I mean about needing sentiment analysis LOL

Did you mean about the MoA?

The TL;DR - I have it working - right now - on my rig. It's strictly manual. I need to detangle it and generalise it, strip out personal stuff and then ship it as v1 (and avoid the oh so tempting scope creep). It needs to be as simple as possible for someone else to retool.

So, it's built and functional right now...but the detangling, writing up specs and docs, uploading everything to Codeberg and mirroring etc will take time. I'm back to work this week and my fun time will be curtailed...though I want nothing more than to hyperfocus on this LOL.

One of the issues with ASD is most of us over-engineer everything for the worst case adversarial outcomes, as a method of reducing meltdowns/shutdowns. Right now, I am specifically using my LLM like someone who hates it and wants to break it...to make sure it does what I say it does.

If you'd like, I can drop my RFC (request for comments, in engineering talk) for you to look at / verify with another LLM / ask someone about. This thing is real, not hype and not vibe coding. I built this because my ASD brain needs it and because I was driven by spite / too miserly to pay out the ass for decent rig. Ironically, those constraints probably led to something interesting (I hope) that can help others (I hope). Like everything else, it's not perfect but it does what it says on the tin 9/10...which is about all you can hope for.

[-] SuspciousCarrot78@lemmy.world 2 points 3 days ago* (last edited 3 days ago)

Right?

Everyone knows you're meant to use a banana as a telephone.

https://www.youtube.com/watch?v=3l9nLXczT3s

Or, alternatively given where we are

https://yewtu.be/search?q=connor+for+real+weirdo

PS: yes, I was tempted to use Raffi's song here instead

[-] SuspciousCarrot78@lemmy.world 1 points 3 days ago* (last edited 3 days ago)

Everything stems from the fact that I want something I can “trust but verify” / see all the seams at a moment’s notice. I assume the LLM will lie to me, so I do everything in my power to squeeze it. Having lost hours and dollars believing ChatGPT, Claude, etc… I live by “fool me once, shame on you. Fool me 4000 times, shame on me”.

The problem with LLMs (generally) is that they are NOT deterministic. You can ask the same question 5 times and get slightly different answers each time, due to the seed, temperature, top_p, etc., settings. That’s one of the main reasons for hallucinations. They give it an RNG (to put it in gaming terms) to make it feel more “alive”. That’s cool and all, but it causes it to bullshit.

I have ASD; I cannot abide my tools having whims or working differently than they should. When I ask something, I want it to answer it EXACTLY correctly (based on my corpus, my IF-THEN GAG, etc.), reason the way I told it to, and show its proof. Do what I said, how I said.

In that way, it acts as an external APU for my brain - I want it to do what I would do, the way I would do it, just faster. And it needs to bring receipts because I am hostile to it as a default stance (once bitten, twice shy).

To be more specific, the MoA has two basic modes. In /serious mode, it will do three careful passes on my question and pull in my documents. For example, if I ask it for launch flags or optimisation of Dolphin emulator or llama.cpp, I want it to reference my documents (scraped from official sites via Scrapy), check my benchmarks and come up with a correct response. Or tell me that it can't, because XYZ. No smooth lies.

It must also provide me with an indicator of accuracy and a source for its information, so I can verify with one click. I trust nothing until it’s earned and even then, I will spot check.

If I want it to reason about a patient’s differential diagnosis, it must climb the GAG nodes and follow my prompts EXACTLY. No flights of fancy AT ALL. Follow the flow-chart, tell me what I must not miss, what the likely diagnosis is, etc. Then I will tell it what I think it is… then we debate. (I’m setting this up for clinical students… I wish I’d had it when I went through).

If I want coding help because I’ve fucked up some Python script (yet again): don’t invent shit. Look at the reference documents and tell me EXACTLY. Teach me and help me unfuck myself. If you can’t, say so honestly and tell me who I should talk to, based on an externally stored policy document.

Then there’s also /casual mode. In casual mode, I want it to let me shoot the shit, vent and help me brainstorm, swear, tell me dirty jokes, reference pop culture… without strict adherence to my presets… unless I invoke them.

So the TL;DR answer to your question is: I use it for work, for fun (coding, gaming) and for generally shooting the shit. The “shooting the shit” aspect also includes stuff like sentiment analysis, because I have trouble sometimes understanding meaning from text.

Regarding the iceberg / what stage I’ve gotten to… baby, I wrote the list. You can be DAMN sure I’ve drilled holes into cases and decompiled shit to make it run on my potato, out of spite. I was really, really tempted to rip out the guts of The Alters and decompile it for an earlier version of Unreal Engine… but I got gifted a better GPU :)

I made CP2077 run on iGPU at 60 fps. Why? Because fuck PCMR. They said it couldn’t be done.

If I ever start an IT company, I will likely name it “Spite Engineering Inc”.

PS: Here is what the MoA said about your post and my draft of this post (aka sentiment analysis). I ignored it (YOLO) cause I'm /casual today :)


"Relative sentiment: the original post is mildly positive and curious with light frustration about hardware cost and stock cooling; your response is more intense and critical toward LLMs but positive about your own system, with a strong control-oriented and adversarial stance toward non-deterministic tools.

The thematic domain (hardware tinkering, efficiency, making full use of resources, interest in research and tooling) matches well, but your affect is higher-intensity and more hard-edged than the original writer’s. Be careful here; you're coming on strong and may scare away the original poster.

Question coverage: you directly answer the main use-case question (work, coding, “shooting the shit,” including research-like tasks and sentiment analysis) and you address the “what stage have you actually gotten to?” question implicitly but clearly by stating you “wrote the list” and giving concrete competence examples.

Your reasoning is organically given / flow of consciousness. Consider dot-points and restructuring.

You did not directly respond to their incidental comments about their 12 GB GPU, RAM prices, undervolting/overclocking, or coolers, but those were not phrased as explicit questions and your reply adequately answers the core queries.

Recommendation: you may wish to address the above in a second draft.

Confidence: high | Source: Mixed (context and stored)


[-] SuspciousCarrot78@lemmy.world 1 points 4 days ago* (last edited 3 days ago)

Well....i used mine to correctly confirm diagnosis of SIH in my wife. SIH is a condition caused by trauma (or in my wife's case, an osteophyte - aka bone spur) in which the protective sheath around the spinal cord is damaged. This makes spinal fluid leak, causing the brain to compress / sag in the skull via traction. The end result is permanent incapacity / disability. Think fun stuff like blindness, life long pain etc.

The median time to diagnosis is 6 months.

The median time to life long impairment is 8 weeks.

We had her diagnosed and in surgery within 3 weeks.

Now, admittedly, this is an unusual confluence of circumstances (me being in the field + me having access to high quality training data I curated + me having strong interest in LLM and diagnostics) but yeah, people use computers like this for all sorts of life saving shit. My example isn't even unique -

https://reddit.com/comments/1ij5yf2

I can regale you with other medical stories (eg; like the kid in Kenya who used their $100 phone & 3B-VL on a Pi5 to scan doctor's handwritten notes, query database and update 10,000 vaccination records, preventing a local measles outbreak, or the other kid prototyping cheap, 3D printed robotic limbs that are bespoke to the user), but suffice it to say computers and self hosted shit actually does save lives.

Get amongst it, I sez. It's fun and who knows where it might lead.

[-] SuspciousCarrot78@lemmy.world 1 points 4 days ago* (last edited 4 days ago)

Read the intro here. Just intro

https://tinyurl.com/FUTOguide

Then start with the smallest possible way. Install Jellyfin on your laptop and share it from there to your phone or smart TV.

Just one file or video. Anything really you have or can grab from Internet archive (I think I started with a single episode of Twilight zone). Minimal viable product.

It will still feel over whelming... but if you can do that, you have your foot in door.

Jellyfin, Plex or Emby are gateway drugs. Before you know it, you'll be ripping the guts out of an old mower, retro fitting it with a raspberry pi and telling your home assistant to mow the lawn.

Why?

Because you can.

[-] SuspciousCarrot78@lemmy.world 1 points 4 days ago* (last edited 4 days ago)

Well, technically, you don't need any GPU for the system I've set up, because only 2-3 models are "hot" in memory (so about....10GB?) and the rest are cold / invoked as needed. My own GPU is only 8GB (and my prior one was 4GB!). I designed this with low end rigs in mind.

The minimum requirement is probably a CPU equal to or better than mine (i7-8700; not hard to match), 8-10GB RAM and maybe 20GB disk space. Bottom of the barrel would be 4gb but you'll have to deal with ssd thrashing.

Anything above that is a bonus / tps multiplier.

FYI; CPU only (my CPU at least) + 32gb system RAM, this entire thing runs at about 10-11 tps, which is interactive enough speed / faster than reading speed. Any decent gpu should get you 3-10x that. I designed this for peasant level hardware / to punch GPTs in the dick thru clever engineering, not sheer grunt. Fuck OpenAi. Fuck Nvidia. Fuck DDR6. Spite + ASD > "you can't do that" :). Yes I fucking can - watch me.

If you want my design philosophy, here is one of my (now shadowbanned) posts from r/lowendgaming. Seeing you're a gamer, this might make sense to you! The MoA design I have is pure "level 8 spite, zip tie Noctura fan to server grade GPU and stick it in a 1L shoebox" YOLOing :).

It works, but it's ugly, in a beautiful way.

Lowend gaming iceberg

Level 1

  • Drop resolution to 720p
  • Turn off AA, AF, Shadows etc
  • Vsync OFF
  • Windowed mode? OK.
  • Pray for decent FPS

Level 2

  • Use Nvidia/Intel/AMD control panel for custom tweaks
  • Create custom low end resolutions (540p, 480p) so GPU can enumerate them to games
  • Pray for decent FPS

Level 3

  • Start tweaking .cfg and .ini files like you're a caveman from the ancient year of 1998
  • FPS capping? Sure.
  • FOV size of a keyhole? Do it
  • Texture filtering hacks / replacements? Rock on.
  • Pray for decent FPS

Level 4

  • Time to get serious. Crack open the box - repaste, clean, try to add more ram from anything that even remotely fits. We can hack the timings to match, no problem!
  • BIOS tweaking time! Let's see what breaks! Oh...everything.
  • May as well undervolt and over clock, seeing we're in here already. Where's my paperclip...
  • EDID hacks to make TV / monitor do dumb shit, like run at resolutions it shouldn't or Hz it pretends it can't? Why not.
  • Pray for decent FPS

Level 5

  • Software time again! Lossless scaling? Sure!
  • Reshade post processing to sharpen ultra low mush? Ok.!
  • Integer scaling? Scanlines? Why not
  • Special K swap chain injection to force low res where no low res exists? Right on.
  • DXVK? Yolo.
  • Pray for decent FPS

Level 6

  • Fuck it; time for real black magic
  • Hack registry keys in windows settings.
  • Hex edit settings directly
  • Make windows believe impossible things, like imaginary VRAM.
  • Sacrifice boxed copy of Win98 to Linus Torvalds for absolution.
  • Pray for decent FPS

Level 7

  • Fine...I'll do it myself then.
  • Strip out the game assets and rewrite shaders
  • No fancy lighting, kill the fill rate, post processing gone.
  • At this point, you may as well just recode the fucking game from scratch.
  • Pray for decent FPS

Level 8

  • Purely driven by spite now.
  • Franken-mod a $15 eGPU and run it via Pcie adaptor. Flash the vBIOS to do unnatural things.
  • Everything is overheating. Drill holes in case to improve airflow.
  • Still too hot; drag in desk fan. Point directly at case. Your PC now sounds like Darth Vader. Neat.
  • Decompile the games DLLs just to prove you can. Sneer at them.
  • No longer praying for FPS; now praying for no magic blue smoke.

Level 9

[-] SuspciousCarrot78@lemmy.world 1 points 5 days ago* (last edited 5 days ago)

For sure. I can dig where you're coming from.

For me, I wanted to replace cloud based services for my personal use / in home as primary motivation; it's only very recently that I am considering things like setting out-of-LAN access for broader family.

(I do have a minimal off site back up (to a raspberry pi stored at my parents home), but obviously this is not enterprise level infra).

My personal quirk is power management. Yes, my rig only uses about 80-100 w...but I can't stop day dreaming creating a fall over system / bespoke UPS. Back of napkin calc suggest that a single marine / car battery should be able to store enough juice to run it (and my router) for 24hrs. Clunky as it is...the DIY nature of that really appeals to me

https://www.youtube.com/watch?v=1q4dUt1yK0g

[-] SuspciousCarrot78@lemmy.world 2 points 5 days ago* (last edited 5 days ago)

Agreed. I have concerns with how Microsoft is handling Github, but organic discovery sure seems to favour Github / reddit / YouTube.

Unsurprising, YouTube (google) really doesn't trust accounts without phone numbers attached (I set mine up before that was a requirement, using a @skiff address, so my ability to upload long form videos is curtailed. I think it was shadow banned from day 1, irrespective of how much we watch YT).

Probably the smart thing to do is to set up on Codeberg and maybe upload some "how to" videos to internet archive, and have github mirorring / forwarding.

That way whoever wants to find it can find it, somehow.

[-] SuspciousCarrot78@lemmy.world 1 points 5 days ago* (last edited 5 days ago)

I'll try explaining using an analogy (though I can go nerd mode if that's better? Let me know; I'm assuming an intelligent lay audience for this but if you want nerd-core, my body is ready lol).

PS: Sorry if scattered - am dictating using my phone (on holiday / laptop broke).


Hallucinations get minimized the same way a teacher might minimise a student from confidently bullshitting on their book reports: you control context (what they’re allowed to talk about), when they’re allowed to improvise, and you make them show their work when it matters by doing a class presentation.

Broadly speaking, that involves using RAG and GAG (of your own documents) as "ground truth", setting temperature low (so LLM has no flights of fancy) and adding verifier passes / critic assessment by second model.

Additionally, a lot of hallucinations come from the model half-remembering something that isn’t in front of it and then "improvising".

To minimise that, I coded a little python tool that forces the llm to store facts verbatim (triggered by using !!) into a JSON (text) file, so that when you ask it something it recalls it exactly as a sort of rolling memory. The basis of that is from something I made earlier for OWUI

https://openwebui.com/posts/total_recall_4a918b04

So what I have in place is this -

I use / orchestrate a couple of different models, each one tuned for a specific behaviour. They work together to produce an answer.

My python router then invokes the correct model for the task at hand based on simple rules (is the question over 300 words? Does it have images? Does it involve facts and figures or is it brain storming/venting/shooting the shit?)

The models I use are

  • Qwen 3-4B 2507 Instruct (usual main brain)
  • Phi-4-mini (critic)
  • Nanbeige 3B (2nd main brain when invoked / shit shooter)
  • You-tu LLM (coding stuff)
  • Qwen3-VL-4b (visual processing)
  • Qwen3-8b (document summariser)
  • Qwen3-1.7b (court jester that when invoked rewrites "main brain" output with contextually appropriate Futurama, Simpsons, Firefly etc quotes. With blackjack. And hookers!).

To give a workflow example - you ask a question.

The python router decides where it needs to go to. Let's suppose its a technical look up / thinking about something in my documents.

The “main brain” generates an answer using whatever grounded stuff you’ve given it access to (in Qdrant database and JSON text file). If no stored info, it notes that explicitly and proceeds to next step (I always want to know where it's pulling it's into from, so I make it cite its references).

That draft gets handed to a separate “critic” whose entire job is to poke holes in it. (I use very specific system prompt for both models so they stay on track).

Then the main brain comes back for a final pass where it fixes the mistakes, reconciles the critique, and gives you the cleaned‑up answer.

It's also allowed to say "I'm not sure; I need XYZ for extra context. Please provide".

It’s basically: propose → attack → improve.

Additionally, I use a deterministic memory system (basically just a python script that writes to a JSON / text file that the LLM writes exactly into and then retrives exactly out from), without editorialising facts of a conversation in progress.

Facts stored get recalled exactly without llm massage or rewrite.

Urgh, I hope that came out OK. I've never had to verbally rubber-duck (explain) it to my phone before :)


TL;DR

Hallucinations minimised by -

  1. Careful fact scraping and curation (using Qdrant database, markdown text summaries and rolling JSON plain text facts file)

  2. Python router that decides which LLM (or more accurately, SLM, given I only have 8GB VRAM) answers what, based on simple rules (eg: coding questions go to coder, science questions go to science etc)

  3. Keeping important facts outside of the LLM, that it needs to reference directly (RAG, GAG, JSON rolling summary).

  4. Setting model temperatures so that responses are as deterministic as possible (no flowery language or fancy reinterpretations; just the facts, ma'am).

  5. Letting the model say "I don't know, based on context. Here's my best guess. Give me XYZ if you want better answer".

Basic flow:

ask question --> router calls model/s --> "main brain" polls stored info, thinks and writes draft --> get criticized by separate "critic" --> "main brain" gets critic output, responds to that, and produces final version.

That reduces “sounds right” answers that are actually wrong. All the seams are exposed for inspection.

[-] SuspciousCarrot78@lemmy.world 22 points 1 week ago* (last edited 1 week ago)

I'm exactly doing this atm. I'm running a homelab on a $200 USD lenovo p330 tiny with a Tesla P4 GPU, via Proxmox, CasaOS and various containers. I'm about 80% finished with what I want it to do.

Uses 40W at the wall (peak around 100W). IOW about the cost of a light bulb. Here's what I run -

LXC 1: Media stack

Radarr, Sonarr, Sabnzdb, Jellyfin. Bye bye Netflix, D+ etc

LXC 2: Gaming stack

Emulation and PC gaming I like. Lots of fun indie titles, older games (GameCube, Wii, PS2). Stream from homelab to any TV in house via Sunshine / Moonlight. Bye bye Gforce now.

LXC 3: AI stack

  • Llama.cpp + llama-swap (AI back ends)

  • Qdrant server (document server)

  • Openwebui (front end)

Bespoke MoA system I designed (which I affectionately call my Mixture of Assholes, not agents) using python router and some clever tricks to make a self hosted AI that doesn't scrape my shit and is fully auditble and non hallucinatory...which would otherwise be impossible with typical cloud "black box" approaches. I don't want black box; I want glass box.

Bye bye ChatGPT.

LXC 4: Telecom stack

Vocechat (self hosted family chat replacement for WhatsApp / messenger),

Lemmy node (TBC).

Bye bye WhatsApp and Reddit

LXC 5: Security stack

Wireguard (own VPN). NPM (reverse proxy). Fail2Ban. PiHole (block ads).

LXC 6: Document stack

Immich (Google photos replacement), Joplin (Google keep), Snapdrop (Airdrop), Filedrop (Dropbox), SearXNG (Search engine).

Once I have everything tuned perfectly, I'm going to share everything on Github / Codeberg. I think the LLM stack alone is interesting enough to merit attention. Everyone makes big claims but I've got the data and method to prove it. I welcome others poking it.

Ultimately, people need to know how to do this, and I'm doing my best to document what I did so that someone could replicate and improve it. Make it easier for the next person. That's the only way forward - together. Faster alone, further together and all that.

PS: It's funny how far spite will take someone. I got into media servers after YouTube premium, Netflix etc jacked their prices up and baked in ads.

I got into lowendgaming when some PCMR midwit said "you can't play that on your p.o.s. rig". Wrong - I can and I did. It just needed know how, not "throw money at problem till it goes away".

I got into self hosting LLM when ChatGPT kept being...ChatGPT. Wasting my time and money with its confident, smooth lies. No, unacceptable.

The final straw was when Reddit locked my account and shadow banned me for using different IP addresses while travelling / staying at different AirBNBs during holiday "for my safety".

I had all the pieces there...but that was the final "fine...I'll do it myself" Thanos moment.

1
submitted 4 months ago* (last edited 4 months ago) by SuspciousCarrot78@lemmy.world to c/retrogaming@lemmy.world

Watching a recent Bringus video on finished but unreleased games for older consoles made me wonder - what games could/should have existed on older machines, but just never made it / weren't ported.

Gauntlet (DS) and Diablo 1 (GBA) come to mind, though the former was leaked.

Doom was famously thought impossible by ID Software to port to the Amiga (but iirc, someone managed to do it just last year).

Any from back in the day you wish could have made it?

I maintain the Wii could have handled some version of GTA, and there's a rumour that it (and FO3!) were in the early stages of development before getting nuked.

-1
My SuperPretendo5 (lemmy.world)
submitted 4 months ago* (last edited 4 months ago) by SuspciousCarrot78@lemmy.world to c/retrogaming@lemmy.world

So, I've been getting back into gaming in a big way these last few months, after a low back injury left me sidelined from my other, more physically active hobbies.

The silver lining (or I suppose cat nip) for my ASD brain is modding. In the past 6 weeks, I've modded a ds, a dsi, a wii u, a wii (which underrated as a "turn it on and play the classics" machine), several Android TVs... yeah...

While I've loved all of them (the Wii especially brought me great joy), I wished there was a way to play everything in one spot. A curated console, as it were.

Of course, you know where this is going - emulation.

Through the wonders of Ebay, I was able go score a Lenovo M93p tiny for around $80USD (a SFF pc from around 2014, about same dimensions as wii).

Throw in a faster processor, lobotomise windows, throw on Playnite, Dolphin Emulator and a bunch of personal classic and... well...let me introduce you to my SuperPretendo 5 :)

PS: ChatGPT gen art (and typos aside), I kinda love what it came up with for this.

view more: next ›

SuspciousCarrot78

joined 4 months ago