[-] brucethemoose@lemmy.world 9 points 4 days ago

The election made me realize just how few voters will ever see this headline.

[-] brucethemoose@lemmy.world 14 points 5 days ago* (last edited 5 days ago)

You know why? Because these teamsters spend two hours a day sitting in a car listening to right-wing radio jockeys telling them how Democrats are only interested in policing their language and getting them in trouble for flirting with women.

Bingo.

Or scrolling their feeds. Or swiping. Depending on the age. Every single Democrat leader needs to see that statistic, so they can stop whining about how policy was their problem.

Democrats are messaging like it's 1950. And hot take, but the only thing that elected Biden was COVID-19, only because it wedged itself into people's lives like absolutely nothing else can.

[-] brucethemoose@lemmy.world 44 points 5 days ago* (last edited 5 days ago)

I kinda get Republicans taking oil money and lying, buts its completely unreal that we elected an honest-to-god climate change denier. Trump actually thinks its BS, out loud, even with his own military telling him this is a national security threat.

History is going to crucify him (and Americans), and the sad part is he's too old to ever suffer from it, or any of the consequences.

[-] brucethemoose@lemmy.world 5 points 5 days ago* (last edited 5 days ago)

Yeah, well Alibaba nearly (and sometimes) beat GPT-4 with a comparatively microscopic model you can run on a desktop. And released a whole series of them. For free! With a tiny fraction of the GPUs any of the American trainers have.

Bigger is not better, but OpenAI has also just lost their creative edge, and all Altman's talk about scaling up training with trillions of dollars is a massive con.

o1 is kind of a joke, CoT and reflection strategies have been known for awhile. You can do it for free youself, to an extent, and some models have tried to finetune this in: https://github.com/codelion/optillm

But one sad thing OpenAI has seemingly accomplished is to "salt" the open LLM space. Theres way less hacky experimentation going on than there used to be, which makes me sad, as many of its "old" innovations still run circles around OpenAI.

[-] brucethemoose@lemmy.world 18 points 5 days ago

Level 203, write it out as a fanfic

[-] brucethemoose@lemmy.world 14 points 6 days ago* (last edited 6 days ago)

To be fair, I'm not sure every single abstaining Palestine supporter in the US could have tipped the vote.

I know there are communities in PA and such, but still...

[-] brucethemoose@lemmy.world 12 points 6 days ago

The US president-elect advised the Russian president not to escalate the war in Ukraine and reminded him of “Washington’s sizeable military presence in Europe”, the Post reported.

It added that Trump expressed interest in follow-up conversations on “the resolution of Ukraine’s war soon”.

This is so weird, almost worded like he just discovered/barely grasps the war in Ukraine.

"We have a big military in Europe. Huge."

Putin nods. "Ah. I didn't know that."

His grudge against Zelensky though... there's no way he forgot that.

[-] brucethemoose@lemmy.world 63 points 6 days ago

Ugh.

I just wish misogynistic comments/voice logs could be taped to abusers' foreheads.

[-] brucethemoose@lemmy.world 9 points 6 days ago* (last edited 6 days ago)

And the Dems seemingly decided that immigration was not an issue to tackle; instead the topic was treated as a sign of how backwards and hateful the republicans for focusing on it. So instead of actually addressing the concerns of voters as a whole, they focused on the issues that rile up the core voters in the democrat party.

Thing is, Harris was like the most anti-immigrant Dem candidate in ages, policy wise. She shamelessly adopted some populist economic policy, same as Trump. Popular policy was there, and it was pretty loud in my own news feed.

Not that I disagree with you about the Dems talking to themselves, speaking to voters instead of listening to them, all that shenanigans with the primary and Biden's fitness... but how does that even compare to the controversies surrounding and following Trump?

I say the Dems would have been screwed even if they set Biden aside early, even if they pounded and focused on issues that actually mattered to voters. It doesn't matter! That's just drama most persuadable voters don't hear it. No, Dems lost the information war. Trump delivered is message to his voters, hence he didn't lose a soul, while the dems campaigned like its 1950 and didn't have the 'luxury' of a pandemic to hammer their message straight into voters' lives.

Just to reiterate all this, I don't understand how everyone is underestimate the massive impact warped social feeds have in the average person's life now. That's literally all that affected everyone I know, bar one or two nutcases like me that don't have a Facebook or Tiktok account.

[-] brucethemoose@lemmy.world 20 points 6 days ago* (last edited 6 days ago)

Is this the end of the country as we know it?

Betteridge's law. So no. Still, I don't disagree with the sentiment...

What drives American politics now is, rather, the unfettered power of money, much of it managed by groups outside party control who do not have to declare their funding sources and can make or break candidates depending on their willingness to follow a preordained set of policy prescriptions.

The Democrats, meanwhile, can talk all they want about serving the interests of all Americans, but they too rely on dark money representing the interests of Wall Street, big tech companies and more, and are all but doomed to come off as hypocritical and insincere as a result.

And yet, The Guardian still lives in another era.

What drives politics now is attention. That's it. The more eyeballs you have, the more people see your message from their feed, their favorite influencer, whatever, the more support you have.

It's that simple.

Doesn't matter if it's cynical, or a blatant lie, that's irrelevant. People's information bubbles aren't checked or audited anymore.

Americans are not thinking of high concepts like the American Dream or isolationism/protectionism, they're voting by how they feel, and they feel Trump is their friend because he blots out the sun. Even if the Dems were more cynical than him, they'd never have a chance unless they play that game.

[-] brucethemoose@lemmy.world 21 points 6 days ago* (last edited 6 days ago)

Completely disagree. I know tons of conservatives that lovingly voted for trump, or voted for him "for the country." I've never even met a single rebel voter.

Small sample size, but still.

The election was a popularity contest, people live in their phone bubbles, and Democrats campaigned like its the 1950s. People voted for Trump thinking he's a hero, eyes wide open, because that's what their information environment is.

Hotter take, but the genie is out of the bottle, and Dems are going to keep losing until they start campaigning like influencer con artists. Fight fire with fire.

If they don't like it? Tough. They should have regulated social media when they had the chance instead of taking their money.

[-] brucethemoose@lemmy.world 13 points 6 days ago* (last edited 6 days ago)

Young voters voted for Trump. Tons of kids support him: https://www.axios.com/2024/11/07/young-men-voters-trump-2024-exit-polls

Unionized workers had grassroots support for Trump: https://www.axios.com/2024/09/18/teamsters-endorsement-harris-trump-2024

Staunch democrats sat out, or protest voted: https://www.axios.com/local/detroit/2024/11/07/why-biden-outperformed-harris-throughout-metro-detroit

Ignorantia juris non excusat. Ignorance of the law excuses not.

America is objectively, measurably, ignorant, trapped in bubbles, and we collectively did little to lift each other out. We are going to face justice for it.

322
submitted 1 month ago* (last edited 1 month ago) by brucethemoose@lemmy.world to c/selfhosted@lemmy.world

I see a lot of talk of Ollama here, which I personally don't like because:

  • The quantizations they use tend to be suboptimal

  • It abstracts away llama.cpp in a way that, frankly, leaves a lot of performance and quality on the table.

  • It abstracts away things that you should really know for hosting LLMs.

  • I don't like some things about the devs. I won't rant, but I especially don't like the hint they're cooking up something commercial.

So, here's a quick guide to get away from Ollama.

  • First step is to pick your OS. Windows is fine, but if setting up something new, linux is best. I favor CachyOS in particular, for its great python performance. If you use Windows, be sure to enable hardware accelerated scheduling and disable shared memory.

  • Ensure the latest version of CUDA (or ROCm, if using AMD) is installed. Linux is great for this, as many distros package them for you.

  • Install Python 3.11.x, 3.12.x, or at least whatever your distro supports, and git. If on linux, also install your distro's "build tools" package.

Now for actually installing the runtime. There are a great number of inference engines supporting different quantizations, forgive the Reddit link but see: https://old.reddit.com/r/LocalLLaMA/comments/1fg3jgr/a_large_table_of_inference_engines_and_supported/

As far as I am concerned, 3 matter to "home" hosters on consumer GPUs:

  • Exllama (and by extension TabbyAPI), as a very fast, very memory efficient "GPU only" runtime, supports AMD via ROCM and Nvidia via CUDA: https://github.com/theroyallab/tabbyAPI

  • Aphrodite Engine. While not strictly as vram efficient, its much faster with parallel API calls, reasonably efficient at very short context, and supports just about every quantization under the sun and more exotic models than exllama. AMD/Nvidia only: https://github.com/PygmalionAI/Aphrodite-engine

  • This fork of kobold.cpp, which supports more fine grained kv cache quantization (we will get to that). It supports CPU offloading and I think Apple Metal: https://github.com/Nexesenex/croco.cpp

Now, there are also reasons I don't like llama.cpp, but one of the big ones is that sometimes its model implementations have... quality degrading issues, or odd bugs. Hence I would generally recommend TabbyAPI if you have enough vram to avoid offloading to CPU, and can figure out how to set it up. So:

This can go wrong, if anyone gets stuck I can help with that.

  • Next, figure out how much VRAM you have.

  • Figure out how much "context" you want, aka how much text the llm can ingest. If a models has a context length of, say, "8K" that means it can support 8K tokens as input, or less than 8K words. Not all tokenizers are the same, some like Qwen 2.5's can fit nearly a word per token, while others are more in the ballpark of half a work per token or less.

  • Keep in mind that the actual context length of many models is an outright lie, see: https://github.com/hsiehjackson/RULER

  • Exllama has a feature called "kv cache quantization" that can dramatically shrink the VRAM the "context" of an LLM takes up. Unlike llama.cpp, it's Q4 cache is basically lossless, and on a model like Command-R, an 80K+ context can take up less than 4GB! Its essential to enable Q4 or Q6 cache to squeeze in as much LLM as you can into your GPU.

  • With that in mind, you can search huggingface for your desired model. Since we are using tabbyAPI, we want to search for "exl2" quantizations: https://huggingface.co/models?sort=modified&search=exl2

  • There are all sorts of finetunes... and a lot of straight-up garbage. But I will post some general recommendations based on total vram:

  • 4GB: A very small quantization of Qwen 2.5 7B. Or maybe Llama 3B.

  • 6GB: IMO llama 3.1 8B is best here. There are many finetunes of this depending on what you want (horny chat, tool usage, math, whatever). For coding, I would recommend Qwen 7B coder instead: https://huggingface.co/models?sort=trending&search=qwen+7b+exl2

  • 8GB-12GB Qwen 2.5 14B is king! Unlike it's 7B counterpart, I find the 14B version of the model incredible for its size, and it will squeeze into this vram pool (albeit with very short context/tight quantization for the 8GB cards). I would recommend trying Arcee's new distillation in particular: https://huggingface.co/bartowski/SuperNova-Medius-exl2

  • 16GB: Mistral 22B, Mistral Coder 22B, and very tight quantizations of Qwen 2.5 34B are possible. Honorable mention goes to InternLM 2.5 20B, which is alright even at 128K context.

  • 20GB-24GB: Command-R 2024 35B is excellent for "in context" work, like asking questions about long documents, continuing long stories, anything involving working "with" the text you feed to an LLM rather than pulling from it's internal knowledge pool. It's also quite goot at longer contexts, out to 64K-80K more-or-less, all of which fits in 24GB. Otherwise, stick to Qwen 2.5 34B, which still has a very respectable 32K native context, and a rather mediocre 64K "extended" context via YaRN: https://huggingface.co/DrNicefellow/Qwen2.5-32B-Instruct-4.25bpw-exl2

  • 32GB, same as 24GB, just with a higher bpw quantization. But this is also the threshold were lower bpw quantizations of Qwen 2.5 72B (at short context) start to make sense.

  • 48GB: Llama 3.1 70B (for longer context) or Qwen 2.5 72B (for 32K context or less)

Again, browse huggingface and pick an exl2 quantization that will cleanly fill your vram pool + the amount of context you want to specify in TabbyAPI. Many quantizers such as bartowski will list how much space they take up, but you can also just look at the available filesize.

  • Now... you have to download the model. Bartowski has instructions here, but I prefer to use this nifty standalone tool instead: https://github.com/bodaay/HuggingFaceModelDownloader

  • Put it in your TabbyAPI models folder, and follow the documentation on the wiki.

  • There are a lot of options. Some to keep in mind are chunk_size (higher than 2048 will process long contexts faster but take up lots of vram, less will save a little vram), cache_mode (use Q4 for long context, Q6/Q8 for short context if you have room), max_seq_len (this is your context length), tensor_parallel (for faster inference with 2 identical GPUs), and max_batch_size (parallel processing if you have multiple user hitting the tabbyAPI server, but more vram usage)

  • Now... pick your frontend. The tabbyAPI wiki has a good compliation of community projects, but Open Web UI is very popular right now: https://github.com/open-webui/open-webui I personally use exui: https://github.com/turboderp/exui

  • And be careful with your sampling settings when using LLMs. Different models behave differently, but one of the most common mistakes people make is using "old" sampling parameters for new models. In general, keep temperature very low (<0.1, or even zero) and rep penalty low (1.01?) unless you need long, creative responses. If available in your UI, enable DRY sampling to tamp down repition without "dumbing down" the model with too much temperature or repitition penalty. Always use a MinP of 0.05 or higher and disable other samplers. This is especially important for Chinese models like Qwen, as MinP cuts out "wrong language" answers from the response.

  • Now, once this is all setup and running, I'd recommend throttling your GPU, as it simply doesn't need its full core speed to maximize its inference speed while generating. For my 3090, I use something like sudo nvidia-smi -pl 290, which throttles it down from 420W to 290W.

Sorry for the wall of text! I can keep going, discussing kobold.cpp/llama.cpp, Aphrodite, exotic quantization and other niches like that if anyone is interested.

1
submitted 1 month ago* (last edited 1 month ago) by brucethemoose@lemmy.world to c/localllama@sh.itjust.works

https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e

Qwen 2.5 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B just came out, with some variants in some sizes just for math or coding, and base models too.

All Apache licensed, all 128K context, and the 128K seems legit (unlike Mistral).

And it's pretty sick, with a tokenizer that's more efficient than Mistral's or Cohere's and benchmark scores even better than llama 3.1 or mistral in similar sizes, especially with newer metrics like MMLU-Pro and GPQA.

I am running 34B locally, and it seems super smart!

As long as the benchmarks aren't straight up lies/trained, this is massive, and just made a whole bunch of models obsolete.

Get usable quants here:

GGUF: https://huggingface.co/bartowski?search_models=qwen2.5

EXL2: https://huggingface.co/models?sort=modified&search=exl2+qwen2.5

65
submitted 2 months ago* (last edited 2 months ago) by brucethemoose@lemmy.world to c/asklemmy@lemmy.world

Obviously there's not a lot of love for OpenAI and other corporate API generative AI here, but how does the community feel about self hosted models? Especially stuff like the Linux Foundation's Open Model Initiative?

I feel like a lot of people just don't know there are Apache/CC-BY-NC licensed "AI" they can run on sane desktops, right now, that are incredible. I'm thinking of the most recent Command-R, specifically. I can run it on one GPU, and it blows expensive API models away, and it's mine to use.

And there are efforts to kill the power cost of inference and training with stuff like matrix-multiplication free models, open source and legally licensed datasets, cheap training... and OpenAI and such want to shut down all of this because it breaks their monopoly, where they can just outspend everyone scaling , stealiing data and destroying the planet. And it's actually a threat to them.

Again, I feel like corporate social media vs fediverse is a good anology, where one is kinda destroying the planet and the other, while still niche, problematic and a WIP, kills a lot of the downsides.

29

Senior U.S., Qatari, Egyptian and Israeli officials will meet on Thursday under intense pressure to reach a breakthrough on the Gaza hostage and ceasefire deal.

he heads of the Israeli security and intelligence services told Netanyahu at the meeting on Wednesday that time is running out to reach a deal and emphasized that delay and insistence on certain positions in the negotiations could cost the lives of hostages, a senior Israeli official said.

85
submitted 3 months ago by brucethemoose@lemmy.world to c/news@lemmy.world
36

HP is apparently testing these upcoming APUs in a single, 8-core configuration.

The Geekbench 5 ST score is around 2100, which is crazy... but not what I really care about. Strix Halo will have a 256 -bit memory bus and 40 CUs, which will make it a monster for local LLM inference.

I am praying AMD sells these things in embedded motherboards with a 128GB+ memory config. Especially in an 8-core config, as I'd rather not burn money and TDP on a 16 core version.

12

cross-posted from: https://lemmy.world/post/16629163

Supposedly for petty personal reasons:

The woman who controls the company, Shari Redstone, snatched defeat from the jaws of victory last week as she scuttled a planned merger with David Ellison's Skydance Media.

Redstone had spent six months negotiating a complicated deal that would have given control of Paramount to Ellison and RedBird Capital, only to call it off as it neared the finish line.

The chief reason for her decision: Her reluctance to let go of a family heirloom she fought very hard to get.

I cross posted this from c/Avatar, but I am a Trekkie too and don't like this one bit.

FYI previous articles seemed to imply the Sony deal is dead.

view more: next ›

brucethemoose

joined 8 months ago