HP Z2 Mini G1a Review: Running GPT-OSS 120B Without a Discrete GPU (www.storagereview.com)

submitted 11 hours ago by sith@lemmy.zip to c/localllama@sh.itjust.works

4 comments fedilink

GPT-OSS 20B and 120B Models on AMD Ryzen AI Processors (www.amd.com)

submitted 1 day ago by sith@lemmy.zip to c/localllama@sh.itjust.works

11 comments fedilink

183

When DeepSeek V4 and R2? (sh.itjust.works)

submitted 5 days ago by pepperfree@sh.itjust.works to c/localllama@sh.itjust.works

26 comments fedilink

Fine tuned models for summarisation? (discuss.online)

submitted 4 days ago by OmegaLemmy@discuss.online to c/localllama@sh.itjust.works

6 comments fedilink

I have a db with a lot of data that all need precise summarisation, I would do it myself if it wasn't 20 thousand fields long

It is about 300k tokens, and Gemini 2.5 struggles missing points and making up facts

Separating them into smaller sections is not an option, because even when seperated they can take up 30k tokens, and the info that needs summarisation may span 100k token ranges

I learnt that fine tuning may have better results than general purpose models, and now I'm wondering if there is anything high token count for summarisation.

Any help would be appreciated, even if its to suggest another general purpose model that has better coherency

So image generation is where it's at? (sh.itjust.works)

submitted 6 days ago by rkd@sh.itjust.works to c/localllama@sh.itjust.works

13 comments fedilink

Total noob to this space, correct me if I'm wrong. I'm looking at getting new hardware for inference and I'm open to AMD, NVIDIA or even Apple Silicon.

It feels like consumer hardware comparatively gives you more value generating images than trying to run chatbots. Like, the models you can run at home are just dumb to talk to. But they can generate images of comparable quality to online services if you're willing to wait a bit longer.

Like, GPT OSS 120b, assuming you can spare 80GB of memory, is still not GPT 5. But Flux Shnell is still Flux Shnel, right? So if diffusion is the thing, NVIDIA wins right now.

Other options might even be better for other uses, but chatbots are comparatively hard to justify. Maybe for more specific cases like code completion with zero latency or building a voice assistant, I guess.

Am I too off the mark?

I clustered four Framework Mainboards to test huge LLMs (www.jeffgeerling.com)

submitted 6 days ago by cm0002@lemmy.world to c/localllama@sh.itjust.works

2 comments fedilink

I built a private Al mini-cluster with Framework Desktop (www.youtube.com)

submitted 1 week ago by fubarx@lemmy.world to c/localllama@sh.itjust.works

4 comments fedilink

134

It's uh drugs, yea that's it! (lemmy.world)

submitted 1 week ago by cm0002@lemmy.world to c/localllama@sh.itjust.works

8 comments fedilink

First try at local openai/gpt-oss-20b (lemmy.world)

submitted 1 week ago by fubarx@lemmy.world to c/localllama@sh.itjust.works

5 comments fedilink

Just tried the new open-source 20b parameter OpenAPI gpt-oss model on my laptop. Here's what I got.

Have no idea why it needed to generate code for a multi-threaded Fibonacci calculator! Funny part is it did all that, then just loaded the original multiplication request into Python and printed out the result.

On the plus side, the performance was pretty decent.

when a construction worker mods a pc + airflow questions (feddit.it)

submitted 1 week ago by brokenlcd@feddit.it to c/localllama@sh.itjust.works

1 comments fedilink

after finally having some free time between exams and work, and enough money to build it. i decided to assemble a decent pc, both for interference and general usage. due to limited budget i chose to pick up a refurbished thinkcenter m700 and a 12GB 3060. the problem? the thinkcenter is an sff pc. so it would have never fit the card, plus due to using a proprietary psu i couldn't upgrade it to something that could run the card.

so that's when the quest began to see how i could ever shoehorn a card+ psu in this mess.

the first thing that arrived was the thinkcentre, so i got to work trying to find a way to make both the pc's and the gpu's psus on at the same time. so i needed some power that would turn on as soon as the pc turned on to power a relay, and thus, turn on the gpu psu.

Luckily the pc had two SATA connectors for powers, one of which i opted to put an ssd in, so the 12V line was free. it was a bit annoying since it used a CPU molex, but the box of scrap parts took care of that:

i ended up adding the relay on the 12V line to turn on the other supply

and the original connector that was in the pc on the 5V to power the ssd.

then it came time to fit the harness inside of the pc, i managed to snake it in... even if i had to mess with zip ties since i had spliced the ssd wire the wrong way around. but in the end, the pc side came out pretty well:

fast forward a couple of weeks (courtesy of the postal system shipping my package to the other side of the country by mistake), i got the card, the psu and the riser.

since i wasn't able to find a riser that turned 90° to the right, i had to place the gpu above the psu, and make a bracket to hold it up, since the riser cable was as stiff as rock. plus i had the idea that after it was all buttoned up, the psu fan would pull air through the gpu as well, somewhat aiding it.

after mocking it up with books, it didn't look too bad so i went on with it.

so now i had to make the bracket, the holes in the top cover to allow both the riser and the switched line out of the case, and find out how to hold and protect this whole mess... so to the workshop i work at we go.

luckily they allowed me in on sundays so i could use all the tools we had in there. (the joys of working as a small artisan :-D)

i have to admit, having a card worth so mutch in the midst of alluminium shaving felt wrong in a way i can't explain, in a laptop next to a pool way.

first thing first, the holes in the case, i just roughly marked where they where supposed to go, and i added the leeway to allow the panel to slide open. the riser hole was done with an angle grinder, while the switched line hole was done with a christmas tree drill bit to 12mm:

now i had to find something that could cover up the sharp ends of the cut, both to not destroy the riser cable and my fingers. luckily we had just bought new band saw blades, and the blade protectors fit perfectly for this job:

now to the psu and bracket for the gpu: my idea was to add two plates to anchor the gpu to the psu, using the card's pci mount to bolt it on. and then add some brackets to allow the psu to screw where the case screws went, locking it all in place:

it's ugly as sin, but in the end it was going to be covered up, so it didn't matter.

the card was locked in place with a nut and bolt in the hole where the screw to secure the card would go, and a bolt/washer/wing nut set to hold the other side, in between the two slot "teeth" the card has.

now i just needed something to hold up the back of the card, since holding it just from the faceplate felt like an extremely dumb idea.

an L extrusion with some of the blade protector on top did the job, i was even able to use the psu's fan screws to lock it in place:

now it was mechanically sturdy, it just lacked a shell to cover it up, in between the scraps i found a sheet of something that would work. i only know it from brand name, but it's essentially a foam panel sandwitched between two alluminium plates, if you cut only one panel, you can bend it and it looks pretty good. so i went with it.

i added L brackets on the pc panel with rivets to hold it steady, and made some holes in the panel to let the card exaust both out of the front and back.

(frankly if it wasn't for the psu cables i would have made it out of plexiglass, since seeing the card suspended like this is beautiful)

now it was just time to bring it out of the workshop and button it all up:

and that's it. i'm surprised it took around a week to build it all, excluding the exodus the gpu had to take to arrive to me.

after running it for the first time with my usual model (a nemo 12B) i have to say... holy shit if there isn't a difference between running at 3tok/s on the deck and 30 tok/s, i was expecting an increase, but not a 10x one. right now i'm converting some 24B models to exl3 3bpw to finally see how they fare.

the only problem i have left now is that during the conversion ( the heavyest workload i managed to throw at it) the card reaches 77°C and i'm not sure if it's dangerous for the card to be cycled between 77°C and 51°C while it writes to the hdd. due to thermal stress.

the problem isn't the air flow of the case, but the fact that the pc is placed in an under desk shelf, the heat is pushed backwards and outwards by the gpu and psu fans, but the hot air still rises toward the top, where the card intakes air.

i'm already seeing if i can put fans in the cubby under the desk, but i'm also seeing if i can undervolt the gpu to have it heat less, since from what i could understand the performance loss is minimal up to a certain point.

the problem with that is that nvidia doesn't expose the core voltage in the drivers for linux (... torvalds was right in this front). i found that there is a workaround to do that with LACT but i'm afraid it's going to mess the card's warranty or the card itself. what do you think? (i'm going to post the question aside as well so people don't have to go through a bible worth of build montage)

i want to thank all the peeps in the !localllama@sh.itjust.works and !pcmasterrace@lemmy.world communities for helping me understand the technicalities of this whole mess, since i never had hardware this poweful at hand.

especially @Smokeydope@lemmy.world and @brucethemoose@lemmy.world from the locallama community for helping me figure out if it was even worthwhile to do this, and for giving me clues for setting up an enviroment to run it all.

and @fuckwit_mcbumcrumble@lemmy.dbzer0.com from the pcmasterrace community for helping me figure out air flow issues.

Qwen-Image is here (aussie.zone)

submitted 1 week ago* (last edited 1 week ago) by Eyekaytee@aussie.zone to c/localllama@sh.itjust.works

11 comments fedilink

🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source.

🔍 Key Highlights:

🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese

🔹 In-pixel text generation — no overlays, fully integrated

🔹 Bilingual support, diverse fonts, complex layouts

🎨 Also excels at general image generation — from photorealistic to anime, impressionist to minimalist. A true creative powerhouse.

Blog: https://qwenlm.github.io/blog/qwen-image/

Hugging Face: https://huggingface.co/Qwen/Qwen-Image

Model Scope: https://modelscope.cn/models/Qwen/Qwen-Image/summary

GitHub: https://github.com/QwenLM/Qwen-Image

Technical Report: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf

WaveSpeed Demo: https://wavespeed.ai/models/wavespeed-ai/qwen-image/text-to-image

Demo: https://modelscope.cn/aigc/imageGeneration?tab=advanced

MindLink-32B and MindLink-72B available on Huggingface (sh.itjust.works)

submitted 1 week ago by pepperfree@sh.itjust.works to c/localllama@sh.itjust.works

2 comments fedilink

Built on Qwen, these models incorporate our latest advances in post-training techniques. MindLink demonstrates strong performance across various common benchmarks and is widely applicable in diverse AI scenarios.

72B 32B

How To Run Deepseek R1 671b Fully Locally On a $2000 EPYC Server (digitalspaceport.com)

submitted 2 weeks ago by cm0002@lemmy.world to c/localllama@sh.itjust.works

1 comments fedilink

My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX (simonwillison.net)

submitted 2 weeks ago by copacetic@discuss.tchncs.de to c/localllama@sh.itjust.works

4 comments fedilink

zai-org/GLM-4.5-Air · Hugging Face (huggingface.co)

submitted 2 weeks ago* (last edited 2 weeks ago) by Eyekaytee@aussie.zone to c/localllama@sh.itjust.works

22 comments fedilink

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

Blog post: https://z.ai/blog/glm-4.5

Hugging Face:

https://huggingface.co/zai-org/GLM-4.5

https://huggingface.co/zai-org/GLM-4.5-Air

AI and You Against the Machine: Guide so you can own Big AI and Run Local - YouTube (www.youtube.com)

submitted 3 weeks ago* (last edited 3 weeks ago) by afk_strats@lemmy.world to c/localllama@sh.itjust.works

5 comments fedilink

I've been following the work that went into this video for a couple of months and have grown to love Level1Techs.

Check out their forum and especially Ubergarm

Mistral AI parie sur la transparence en rendant public son impact environnemental (www.lemonde.fr)

submitted 3 weeks ago by Eyekaytee@aussie.zone to c/localllama@sh.itjust.works

0 comments fedilink

Mistral AI bets on transparency by making its environmental impact public The French artificial intelligence startup, along with Ademe and Carbone 4, has published a study on the impact, particularly on CO₂ emissions, of training and the use of its models.

Courage was truly precient. (lemmy.world)

submitted 3 weeks ago* (last edited 3 weeks ago) by Smokeydope@lemmy.world to c/localllama@sh.itjust.works

0 comments fedilink

Le Chat dives deep. | Mistral AI (mistral.ai)

submitted 4 weeks ago by Eyekaytee@aussie.zone to c/localllama@sh.itjust.works

0 comments fedilink

What’s new in Le Chat.

Deep Research mode: Lightning fast, structured research reports on even the most complex topics.

Voice mode: Talk to Le Chat instead of typing with our new Voxtral model.

Natively multilingual reasoning: Tap into thoughtful answers, powered by our reasoning model — Magistral.

Projects: Organize your conversations into context-rich folders.

Advanced image editing directly in Le Chat, in partnership with Black Forest Labs.

Voxtral - Audio Understanding Based on Mistral (mistral.ai)

submitted 4 weeks ago by General_Effort@lemmy.world to c/localllama@sh.itjust.works

0 comments fedilink

https://huggingface.co/mistralai/Voxtral-Mini-3B-2507

https://huggingface.co/mistralai/Voxtral-Small-24B-2507

This is Apache 2.0, as usual from Mistral, but no training data.

Audio Flamingo 3 - Fully Open Large Audio Language Models (research.nvidia.com)

submitted 1 month ago by General_Effort@lemmy.world to c/localllama@sh.itjust.works

57 comments fedilink

https://huggingface.co/nvidia/audio-flamingo-3

Demo: https://audioflamingo3.github.io/

Another great open-source LM from China: Kimi2 has 1000 billion parameters with only 32 billion active (huggingface.co)

submitted 1 month ago by herseycokguzelolacak@lemmy.ml to c/localllama@sh.itjust.works

1 comments fedilink

Very large amounts of gaming gpus vs AI gpus (ani.social)

submitted 1 month ago by TheMightyCat@ani.social to c/localllama@sh.itjust.works

8 comments fedilink

cross-posted from: https://ani.social/post/16779655

GPU VRAM Price (€) Bandwidth (TB/s) TFLOP16 €/GB €/TB/s €/TFLOP16

NVIDIA H200 NVL 141GB 36284 4.89 1671 257 7423 21

NVIDIA RTX PRO 6000 Blackwell 96GB 8450 1.79 126.0 88 4720 67

NVIDIA RTX 5090 32GB 2299 1.79 104.8 71 1284 22

AMD RADEON 9070XT 16GB 665 0.6446 97.32 41 1031 7

AMD RADEON 9070 16GB 619 0.6446 72.25 38 960 8.5

AMD RADEON 9060XT 16GB 382 0.3223 51.28 23 1186 7.45

This post is part "hear me out" and part asking for advice.

Looking at the table above AI gpus are a pure scam, and it would make much more sense to (atleast looking at this) to use gaming gpus instead, either trough a frankenstein of pcie switches or high bandwith network.

so my question is if somebody has build a similar setup and what their experience has been. And what the expected overhead performance hit is and if it can be made up for by having just way more raw peformance for the same price.

GPU	VRAM	Price (€)	Bandwidth (TB/s)	TFLOP16	€/GB	€/TB/s	€/TFLOP16
NVIDIA H200 NVL	141GB	36284	4.89	1671	257	7423	21
NVIDIA RTX PRO 6000 Blackwell	96GB	8450	1.79	126.0	88	4720	67
NVIDIA RTX 5090	32GB	2299	1.79	104.8	71	1284	22
AMD RADEON 9070XT	16GB	665	0.6446	97.32	41	1031	7
AMD RADEON 9070	16GB	619	0.6446	72.25	38	960	8.5
AMD RADEON 9060XT	16GB	382	0.3223	51.28	23	1186	7.45

Running Local LLMs with Ollama on openSUSE Tumbleweed (news.opensuse.org)

submitted 1 month ago by cm0002@programming.dev to c/localllama@sh.itjust.works

26 comments fedilink

Apple needs to spend real money to bring in outside talent — and that likely means acquiring a leading AI startup. It has already kicked the tires on Perplexity and will seriously consider Mistral (www.bloomberg.com)

submitted 1 month ago by Eyekaytee@aussie.zone to c/localllama@sh.itjust.works

3 comments fedilink

https://archive.md/SFoGf

pls no

LocalLLaMA

3538 readers

17 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works