[-] brucethemoose@lemmy.world 1 points 3 hours ago* (last edited 3 hours ago)
  • Tinygrad is (so far) software only, ostensibly sort of a lightweight PyTorch replacement.

  • Tinygrad is (so far) not really used for much, not even research or tinkering.

Between that and the lead dev's YouTube antics, it kinda seems like hot air to me.

[-] brucethemoose@lemmy.world 2 points 3 hours ago* (last edited 3 hours ago)

It could be if it’s run locally.

If you agents run on your hardware, navigate crappy apps and websites and such for you, what do you need the corporate cloud for? How can they show ads or monetize you through that?

That’s the war raging right now, open-weights vs closed weights.

[-] brucethemoose@lemmy.world 2 points 4 hours ago* (last edited 4 hours ago)

They aren't specialized though!

There are a lot of directions "AI" could go:

  • Is autoregressive bitnet going to take off? In that case, the compute becomes extremely light, and the thing to optimize for is memory bandwidth and cache.
  • Or diffusion or something with fewer passes like that? In that case, we go the opposite direction, throw bandwidth out the window, and optimize for matmul compute.
  • What if it's both? In that case, one wants a truckload of ternary adders and not too much else.
  • Or what if some other form of sparsity takes over, as (given the effectiveness of quantization and MoE), there's clearly a ton of sparsity to take advantage of. Nvidia already bet on this, but it hasn't taken off yet.

There's all sorts of wild directions the sector could go. Having the flexibility of an ASIC die would be a huge benefit for AMD, as they don't have to 'commit' to any particular direction like Nvidia's monolithic dies. If a new trend takes off, they can take an existing die and swap out the ASIC relatively quickly, without taping out a whole new GPU.

[-] brucethemoose@lemmy.world 3 points 6 hours ago* (last edited 6 hours ago)

With AMD’s IP, they could make a hybrid chip, eg a (for example) bitnet ASIC hanging off a GPU for flexible, cuda-compatible compute where needed.

Nvidia sorta does this now (with tensor cores being a separate part of the die), but with their history of MCM designs, AMD could take it to an extreme.

[-] brucethemoose@lemmy.world 1 points 7 hours ago

I'd recommend localsend, in situations when you're on the same network!

https://localsend.org/

[-] brucethemoose@lemmy.world 2 points 2 days ago* (last edited 2 days ago)

One one more thing, I saw you mention context management.

Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K (which is a huge body of text) and (with the right backend settings) you can easily run either at 64K with very minimal vram overhead.

Specifically, run Gemma with the latest llama.cpp server and comment (where it will automatically use sliding window attention as of like yesterday), or Qwen (and most other models) with exllamav2 or exllamav3, which quantizes the kv cache down to Q4 very efficiently.

This way you don’t need to manage context: you can feed the LLM the whole adventure so it doesn’t forget anything, and streaming responses will be instance since it’s always cached.

[-] brucethemoose@lemmy.world 1 points 2 days ago

Oh, one thing about ST specifically: its default sampling presets are catastrophic last I checked. Like, they’re designed for ancient models, and while I have nothing against the UI it is kinda from a different era.

For Gemma and Qwen, I’ve been using like 0.2-0.7 temp, at least 0.05 MinP, 1.01 rep penalty (not something insane like 1.1) and maybe 0.3-ish dry, though like you said dry/xtc can really mess up some tasks.

[-] brucethemoose@lemmy.world 2 points 2 days ago* (last edited 2 days ago)

Also, another suggestion would be to be careful with your sampling. Use a low temperature and high MinP for queries involving rules, higher temperature (+ samplers like DRY) when you're trying to tease out interesting ideas.

I would even suggest an alt front end like mikupad that exposes token probabilities, so you can go to any point in the reply and look through every “idea” the LLM had internally (and regen from that point of you wish”). It’s also good for debugging sampling issues when you have an incorrect answer (as sometimes the LLM gets it right, but bad sampling parameters choose a bad answer).

[-] brucethemoose@lemmy.world 1 points 2 days ago

As long as it supports network inference between machines with heterogeneous cards, it would work for what I have in mind.

It probably doesn’t, heh, especially non Nvidia cards. But the middle layer may work with some generic OpenAI backend like the llama.cpp server.

[-] brucethemoose@lemmy.world 2 points 2 days ago

Late to the post, but look into SGLang, OP!

In a nutshell, it’s a framework for letting LLMs “fill in blanks” instead of generating entire replies, so you could script in rules as part of the responses as structure for it to grab onto. It’s all locally runnable (with the right hardware, unfortunately).

Also, there are some newer, less sycophantic DM specific models. I can look around if you want.

[-] brucethemoose@lemmy.world 4 points 2 days ago

Is it a virus that affects the brain?

Yes! It’s called engagement optimization. And the worlds collective ignorance “don’t feed the trolls.”

[-] brucethemoose@lemmy.world 8 points 3 days ago

It feels unsustainable, right? Like the value of of this tsunami of advertising has to be inflated, especially with bots/agents taking over traffic. People’s tolerance for junk isn’t infinite. At some point the illusion has to crack, and the advertising bubble will pop and burn the internet/app ecosystems down, hopefully…

1
submitted 4 days ago* (last edited 4 days ago) by brucethemoose@lemmy.world to c/usa@lemmy.ml

In a nutshell, he’s allegedly frustrated by too few policies favorable to him.

63
submitted 1 month ago* (last edited 1 month ago) by brucethemoose@lemmy.world to c/world@lemmy.world
  • The IDF is planning to displace close to 2 million Palestinians to the Rafah area, where compounds for the delivery of humanitarian aid are being built.
  • The compounds are to be managed by a new international foundation and private U.S. companies, though it's unclear how the plan will function after the UN and all aid organizations announced they won't take part
17
Qwen3 "Leaked" (huggingface.co)

Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:

screenshots

Including a 235B/22B active and a 30B/3B active.

Context appears to 'only' be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json

But its possible they're still training them to 256K:

from reddit

Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.

23
submitted 1 month ago* (last edited 1 month ago) by brucethemoose@lemmy.world to c/localllama@sh.itjust.works

This is one of the "smartest" models you can fit on a 24GB GPU now, with no offloading and very little quantization loss. It feels big and insightful, like a better (albeit dry) Llama 3.3 70B with thinking, and with more STEM world knowledge than QwQ 32B, but comfortably fits thanks the new exl3 quantization!

Quantization Loss

You need to use a backend that support exl3, like (at the moment) text-gen-web-ui or (soon) TabbyAPI.

13
submitted 1 month ago* (last edited 1 month ago) by brucethemoose@lemmy.world to c/localllama@sh.itjust.works

Seems there's not a lot of talk about relatively unknown finetunes these days, so I'll start posting more!

Openbuddy's been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy's dataset, apparently with QAT applied (though it's kinda unclear) and context-extended. Observations:

  • Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.

  • Seems good at long context. Have not tested 200K, but it's quite excellent in the 64K range.

  • Works fine in English.

  • The chat template is funky. It seems to mix up the and <|think|> tags in particular (why don't they just use ChatML?), and needs some wrangling with your own template.

  • Seems smart, can't say if it's better or worse than QwQ yet, other than it doesn't seem to "suffer" below 3.75bpw like QwQ does.

Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it seems like we should be on Lemmy instead?

31
submitted 1 month ago* (last edited 1 month ago) by brucethemoose@lemmy.world to c/asklemmy@lemmy.world

So I had a clip I wanted to upload to a lemmy comment:

  • Tried it as an (avc) mp4... Failed.
  • OK, too big? I shrink it to 2MB, then 1MB. Failed.
  • VP9 Webm maybe? 2MB, 1MB, failed. AV1? Failed.
  • OK, fine, no video. Lets try an animated AVIF. Failed. It seems lemmy doesn't even take static AVIF images
  • WebP animation then... Failed. Animated PNG, failed.

End result, I have to burden the server with a massive, crappy looking GIF after trying a dozen formats. With all due respect, this is worse than some aging service like Reddit that doesn't support new media formats.

For reference, I'm using the web interface. Is this just a format restriction of lemmy.world, or an underlying software support issue?

-2

53% of Americans approve of Trump so far, according to a newly released CBS News/YouGov poll conducted Feb. 5 to 7, while 47% disapproved.

A large majority, 70%, said he was doing what he promised in the campaign, per the poll that was released on Sunday.

Yes, but: 66% said he was not focusing enough on lowering prices, a key campaign trail promise that propelled Trump to the White House.

44% of Republicans said Musk and DOGE should have "some" influence, while just 13% of Democrats agreed.

1
submitted 4 months ago* (last edited 4 months ago) by brucethemoose@lemmy.world to c/politics@lemmy.world

Here's the Meta formula:

  • Put a Trump friend on your board (Ultimate Fighting Championship CEO Dana White).
  • Promote a prominent Republican as your chief global affairs officer (Joel Kaplan, succeeding liberal-friendly Nick Clegg, president of global affairs).
  • Align your philosophy with Trump's on a big-ticket public issue (free speech over fact-checking).
  • Announce your philosophical change on Fox News, hoping Trump is watching. In this case, he was. "Meta, Facebook, I think they've come a long way," Trump said at a Mar-a-Lago news conference, adding of Kaplan's appearance on the "Fox and Friends" curvy couch: "The man was very impressive."
  • Take a big public stand on a favorite issue for Trump and MAGA (rolling back DEI programs).
  • Amplify that stand in an interview with Fox News Digital. (Kaplan again!)
  • Go on Joe Rogan's podcast and blast President Biden for censorship.
18
submitted 5 months ago* (last edited 5 months ago) by brucethemoose@lemmy.world to c/politics@lemmy.world

Reality check: Trump pledged to end the program in 2016.

Called it. When push comes to shove, Trump is always going to side with the ultra-rich.

11
submitted 5 months ago* (last edited 5 months ago) by brucethemoose@lemmy.world to c/politics@lemmy.world

Trump, who has remained silent thus far on the schism, faces a quickly deepening conflict between his richest and most powerful advisors on one hand, and the people who swept him to office on the other.

All this is stupid. But I know one thing:

Trump is a billionaire.

And I predict his followers are going to learn who he’ll side with when push comes to shove.

Also, Bannon’s take is interesting:

Bannon tells Axios he helped kick off the debate with a now-viral Gettr post earlier this month calling out a lack of support for the Black and Hispanic communities in Big Tech.

3
8
submitted 5 months ago* (last edited 5 months ago) by brucethemoose@lemmy.world to c/leopardsatemyface@lemmy.world

I think the title explains it all… Even right wing influencers can have their faces eaten. And Twitter views are literally their livelihood.

Trump's conspiracy-minded ally Laura Loomer, New York Young Republican Club president Gavin Wax and InfoWars host Owen Shroyer all said their verification badges disappeared after they criticized Musk's support for H1B visas, railed against Indian culture and attacked Ramaswamy, Musk's DOGE co-chair.

view more: next ›

brucethemoose

joined 1 year ago