653

submitted 2 months ago by qaz@lemmy.world to c/programmer_humor@programming.dev

73 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Xylight@lemdro.id 13 points 2 months ago

There is a reason there is sometimes a notable decrease in quality of the same AI model a while after it's released.

Hosters of the models (like OpenAI or Microsoft) may have switched to a quantized version of their model. Quantization is a common practice to increase power efficiency and make the model easier to run, by essentially rounding the weights of the model to a lower precision. This decreases VRAM and storage usage significantly, at the cost of a bit of quality, where higher quantization results in worse quality.

For example, the base model will likely be in FP16, full floating point precision. They may switch to a Q8 version, which nearly halves the size of the model, with about a 3-7% decrease in quality.

[-] MonkeMischief@lemmy.today 4 points 2 months ago* (last edited 2 months ago)

Expertly explained. Thank you! It's pretty rad what you can get out of a quantized model on home hardware, but I still can't understand why people are trying to use it for anything resembling productivity.

It sounds like the typical tech industry:

"Look how amazing this is!" (Full power)

"Uh...uh oh, that's unsustainable. Let's quietly drop it." (Way reduced power)

"People are saying it's not as good, we can offer them LLM+ plus for better accuracy!" (3/4 power with subscription)

[-] mcv@lemmy.zip 2 points 2 months ago

But if that's how you're going to run it, why not also train it in that mode?

[-] Xylight@lemdro.id 2 points 2 months ago

That is a thing, and it's called quantization aware training. Some open weight models like Gemma do it.

The problem is that you need to re-train the whole model for that, and if you also want a full-quality version you need to train a lot more.

It is still less precise, so it'll still be worse quality than full precision, but it does reduce the effect.

[-] mudkip@lemdro.id 0 points 2 months ago

Your response reeks of AI slop

[-] Xylight@lemdro.id 1 points 2 months ago

4/10 bait

[-] mudkip@lemdro.id 1 points 2 months ago

Is it, or is it not, AI slop? Why are you using so heavily markdown formatting? That is a telltale sign of an LLM being involved

[-] psud@aussie.zone 1 points 2 months ago

heavily markdown formatting

They used one formatting mark, and it's the most common. What are you smoking, and may I have some?

[-] Xylight@lemdro.id 1 points 2 months ago

I am not using an llm but holy bait

Hop off the reddit voice

[-] mudkip@lemdro.id 1 points 2 months ago

...You do know what platform you're on? It's a REDDIT alternative

this post was submitted on 10 Oct 2025

653 points (99.2% liked)

Programmer Humor

28191 readers

939 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

Keep content in english
No advertisements
Posts must be related to programming or programmer topics

founded 2 years ago

MODERATORS

Feyter@programming.dev

anzo@programming.dev

BurningTurtle@programming.dev

pylapp@programming.dev