overview for hendrik

Running Local LLMs with Ollama on openSUSE Tumbleweed by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 2 points 16 hours ago* (last edited 16 hours ago)

CPU-only. It's an old Xeon workstation without any GPU, since I mostly do one-off AI tasks at home and I never felt any urge to buy one (yet). Model size woul be something between 7B and 32B with that. Context length is something like 8128 tokens. I have a bit less than 30GB of RAM to waste since I'm doing other stuff on that machine as well.

And I'm picky with the models. I dislike the condescending tone of ChatGPT and newer open-weight models. I don't want it to blabber or praise me for my "genious" ideas. It should be creative, have some storywriting abilities, be uncensored and not overly agreeable. Best model I found for that is Mistral-Nemo-Instruct. And I currently run a Q4_K_M quant of it. That does about 2.5 t/s on my computer (which isn't a lot, but somewhat acceptable for what I do). Mistral-Nemo isn't the latest and greatest any more. But I really prefer it's tone of speaking and it performs well on a wide variety of tasks. And I mostly do weird things with it. Let it give me creative advice, be a dungeon master or an late 80s text adventure. Or mimick a radio moderator and feed it into TTS for a radio show. Or write a book chapter or a bad rap song. I'm less concerned with the popular AI use-cases like answer factual questions or write computer code. So I'd like to switch to a newer, more "intelligent" model. But that proves harder than I imagined.

(Occasionally I do other stuff as well, but that's a far and in-between. So I'll rent a datacenter GPU on runpod.io for a few bucks an hour. That's the main reason why I didn't buy an own GPU yet.)

Very large amounts of gaming gpus vs AI gpus by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 3 points 1 day ago* (last edited 1 day ago)

I think there are some posts out there (on the internet / Reddit / ...) with people building crazy rigs with old 3090s or something. I don't have any experience with that. If I were to run such a large model, I'd use a quantized version and rent a cloud server for that.

And I don't think computers can fit infinitely many GPUs. I don't know the number, let's say it's 4. So you need to buy 5 computers to fit your 18 cards. So add a few thousand dollars. And a fast network/interconnect between them.

I can't make any statement for performance. I'd imagine such a scenario might work for MoE models with appropriate design. And for the rest performance is abysmal. But that's only my speculation. We'd need to find people who did this.

Edit: Alternatively, buy a Apple Mac Studio with 512GB of unified RAM. They're fast as well (probably way faster than your idea?) and maybe cheaper. Seems an M3 Ultra Mac Studio with 512GB costs around $10,000. With half that amount, it's only $7,100.

Very large amounts of gaming gpus vs AI gpus by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 5 points 1 day ago* (last edited 1 day ago)

Well, I wouldn't call them a "scam". They're meant for a different use-case. In a datacenter, you also have to pay for rack space and all the servers which accomodate all the GPUs. And you can now pay for 32 times as many servers with Radeon 9060XT or you buy H200 cards. Sure, you'll pay 3x as much for the cards itself. But you'll save on the amount of servers and everything that comes with it, hardware cost, space, electricity, air-con, maintenance... Less interconnect makes everything way faster...

Of course at home different rules apply. And it depends a bit how many cards you want to run, what kind of workload you have... If you're fine with AMD or you need Cuda...

Running Local LLMs with Ollama on openSUSE Tumbleweed by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 2 points 1 day ago

Thanks for the random suggestion! Installed it already. Sadly as a drop-in replacement it doesn't provide any speedup on my old machine, it's exactly the same number of tokens per second... Guess I have to learn about the ik_llama.cpp and pick a different quantization of my favourite model.

Running Local LLMs with Ollama on openSUSE Tumbleweed by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 3 points 1 day ago

Thanks. I'll factor that in next time someone asks me for a recommendation. I personally have Kobold.CPP on my machine, that seems to be more transparent toward such things.

Running Local LLMs with Ollama on openSUSE Tumbleweed by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 4 points 1 day ago* (last edited 1 day ago)

Is there any background information available on ollama becoming less open? It's marked MIT licensed in the repo of my Linux distribution and on their Github.

Apple needs to spend real money to bring in outside talent — and that likely means acquiring a leading AI startup. It has already kicked the tires on Perplexity and will seriously consider Mistral by hendrik in c/localllama@sh.itjust.works

[-] hendrik@palaver.p3x.de 5 points 1 day ago

Wasn't Mistral AI supposed to be one of the European (French) answers to mostly US companies doing AI? From that perspective it wouldn't be super great if an US company were to buy them. And the company goals don't match either as Apple is mostly concerned with their own products. So I'd say they'd likely dismantle the company and we have one less somewhat open AI company.

Banned for Voting Incorrectly? by hendrik in c/yepowertrippinbastards@lemmy.dbzer0.com

[-] hendrik@palaver.p3x.de 4 points 5 days ago* (last edited 5 days ago)

Good question. I don't know when Lemmy got the feature that mods can see all votes, but looks to me someone is agitated/frustrated or something and goes through the logs. We had some discussion back then about people doing their thing in their communities and then some random people aren't even subscribed and do drive-by downvotes... Which is a bit frustrating. And AI is one of the many polarizing topics here. People tried discussing it in peace but it's not very easy. Maybe OP got caught in the turmoil of this. Or they pissed off that person and then the next downvote was one too many... I don't really know. And the person calling out people by name sounds a bit agitated. I'd say someone with that state of mind is likely going to react a bit more extreme. And they're concerned with voting fraud and brigading in general.

China’s maglev research program says it has achieved the highest speed ever for a maglev train - 650 km/h (about 404 mph) - beating the previous Japanese record by 47 km/h. by hendrik in c/futurology@futurology.today

[-] hendrik@palaver.p3x.de 4 points 6 days ago

Ah thanks. That makes sense.

Banned for Voting Incorrectly? by hendrik in c/yepowertrippinbastards@lemmy.dbzer0.com

[-] hendrik@palaver.p3x.de 12 points 6 days ago* (last edited 6 days ago)

Ah right, maybe that was it. I remember seeing the post as well. You got "called out" by name publicly. For supposed "brigading". And told to F off. That must be the reason for this?! https://discuss.tchncs.de/post/34853477

Banned for Voting Incorrectly? by hendrik in c/yepowertrippinbastards@lemmy.dbzer0.com

[-] hendrik@palaver.p3x.de 9 points 6 days ago* (last edited 6 days ago)

You're a bit more easygoing with the downvotes than the average Lemmy user. Those rarely downvote, while you do like 30% downvotes. Maybe that triggered someone if you did something like scroll through a community and hand out several downvotes consecutively. But I don't think you're doing anything wrong here.

China’s maglev research program says it has achieved the highest speed ever for a maglev train - 650 km/h (about 404 mph) - beating the previous Japanese record by 47 km/h. by hendrik in c/futurology@futurology.today

[-] hendrik@palaver.p3x.de 6 points 6 days ago* (last edited 6 days ago)

To be fair, this is a 1.1-tonne test vehicle. While the Japanese maglev is an entire train. I guess the real issue, though, is to get it from a tech demonstration into a prototype and then into an actual product. China doesn't even have a prototype with that. And the maglev in Japan is just a prototype. I think it's on a test track to show off a few times a day, but doesn't transport people anywhere. And we had other people fail early on with the entire vacuum tube idea. Like Elon Musk also promised that a decade ago and it's scrapped now.

5

Recommendations for a lightweight Python LLM framework for a webapp? (palaver.p3x.de)

submitted 4 months ago* (last edited 4 months ago) by hendrik@palaver.p3x.de to c/localllama@sh.itjust.works

4 comments fedilink

I'm developing a small Python webapp as some sort of finger exercise. Mostly a chatbot. I'm using the Quart framework, which is pretty much alike Flask, just async. Now I want to connect that to a LLM inference endpoint. And while I could do the HTTP requests myself, I'd prefer something that does that for me. It should support the usual OpenAI style API, in the end I'd like it to connect to things like Ollama and KoboldCPP. No harm if it supports image generation, agents, tools, vector databases, but that's optional.

I've tried Langchain, but I don't think I like it very much. Are there other Python frameworks out there? What do you like? I'd prefer something relatively lightweigt that gets out of the way. Ideally provider agnostic, but I'm mainly looking for local solutions like the ones I mentioned.

Edit: Maybe something that also connects to a Runpod endpoint, to do inference on demand (later on)? Or at least something which I can adapt to that?

24

Is there a better open-source calendar app than Etar? (palaver.p3x.de)

submitted 5 months ago by hendrik@palaver.p3x.de to c/android@lemdro.id

7 comments fedilink

I've been using Etar for years now. But the Samsung calendar app on my wife's phone looks way better, while I'm missing things like the titles in the appointments once it gets crowded. And the all day events and birthdays aren't that prominent either. Plus I don't have some features on Etar like adding notes/emojis to days.

Is there a better calendar app out there? It has to be open source and somehow connect to my Nextcloud. That'd be my requirements. But I believe all calendar apps can connect to webdav.

13

(New) papers by Meta: Large Concept Models and BLT (palaver.p3x.de)

submitted 6 months ago* (last edited 6 months ago) by hendrik@palaver.p3x.de to c/localllama@sh.itjust.works

2 comments fedilink

Seems Meta have been doing some research lately, to replace the current tokenizers with new/different representations:

Large Concept Models: Language Modeling in a Sentence Representation Space [Github] (December 11, 2024)
Byte Latent Transformer: Patches Scale Better Than Tokens [Github] (December 12, 2024)

43

Which app isolation mechanism do I want? (palaver.p3x.de)

submitted 7 months ago by hendrik@palaver.p3x.de to c/android@lemdro.id

13 comments fedilink

I got a new phone. Skipped a few generations and now I'm running the current GrapheneOS, based on Android 15. I've moved most of the apps, but now I'd like to install my 3 banking apps and 5 discount program spyware apps. I guess I best separate them from the rest of the arbitrary stuff. Banking apps so they can't be messed with, and shady discount programs so those apps can't mess with me and my data...

The internet has a lot of information about Shelter, work profiles, the new(?) private spaces... But I don't know what is current advice and what's outdated advice... What's the current best practice?

51

SFSCON24 - Alexander Sander - NGI: No more EU funding for Free Software?! (media.fsfe.org)

submitted 7 months ago* (last edited 7 months ago) by hendrik@palaver.p3x.de to c/fediverse@lemmy.world

3 comments fedilink

During the summer the European Commission made the decision to stop funding Free Software projects within the Next Generation Internet initiative (NGI). This decision results in a loss of €27 million for software freedom. Since 2018, the European Commission has supported the Free Software ecosystem through NGI, that provided funding and technical assistance to Free Software projects. This decision unfortunately exposes a larger issue: that software freedom in the EU needs more stable, long-term financial support. The ease with which this funding was excluded underlines this need.

CC BY-SA 4.0 - SFSCON 2024

Cross-posted from the FSFE Peertube Channel

81

Is there a working Spotify downloader that actually downloads from Spotify? (palaver.p3x.de)

submitted 10 months ago* (last edited 10 months ago) by hendrik@palaver.p3x.de to c/piracy@lemmy.dbzer0.com

16 comments fedilink

Seems they recently changed something on Spotify and all the tools I've tried fail now. And DownOnSpot which seems promising has received a cease and desist letter and got taken down. What do you people use? I want something that actually fetches the audio from Spotify, not just rip it from YouTube. And it has to work as of now. Does the latest commit from DownOnSpot work? Back when I tested it a few weeks ago it failed due to some API changes. Are there other tools floating around?

1

Is Arli AI a legit cloud LLM inference service? Any user experience? (palaver.p3x.de)

submitted 10 months ago* (last edited 8 months ago) by hendrik@palaver.p3x.de to c/localllama@sh.itjust.works

0 comments fedilink

I just found https://www.arliai.com/ who offer LLM inference for quite cheap. Without rate-limits and unlimited token generation. No-logging policy and they have an OpenAI compatible API.

I've been using runpod.io previously but that's a whole different service as they sell compute and the customers have to build their own Docker images and run them in their cloud, by the hour/second.

Should I switch to ArliAI? Does anyone have some experience with them? Or can recommend another nice inference service? I still refuse to pay $1.000 for a GPU and then also pay for electricity when I can use some $5/month cloud service and it'd last me 16 years before I reach the price of buying a decent GPU...

Edit: Saw their $5 tier only includes models up to 12B parameters, so I'm not sure anymore. For larger models I'd need to pay close to what other inference services cost.

Edit2: I discarded the idea. 7B parameter models and one 12B one is a bit small to pay for. I can do that at home thanks to llama.cpp

110

How to make the Threadiverse a nice place and effectively make it grow (palaver.p3x.de)

submitted 11 months ago* (last edited 11 months ago) by hendrik@palaver.p3x.de to c/fediverse@lemmy.world

82 comments fedilink

tl;dr: Be excellent to each other, do something constructive here?

I'm not sure anymore where the Threadiverse is headed. (The Threadiverse being this threaded part of the Fediverse, i.e. Lemmy, MBin, PieFed, ...)
In my time here, I've met a lot of nice people and had meaningful conversations and learned lots of things. At the same time, it's always been a mixed bag. We've always had quite some argumentative people here, trolls, ... I've seen people hate on and yell at each other, and do all kinds of destructive things. My issue with that is: Negative behavior is disproportionately affecting the atmosphere. And I'd argue we have nowhere enough nice behavior to even that out.

I don't see Lemmy grow for quite some time now. Seems it's now leveling off at a bit less that 50k monthly active users. And I don't see how that'd change. I'm missing some clear vision/idea of where we want to be headed. And I miss an atmosphere that makes people want to join or stay here, of all of the places on the internet. The saying is: "If you don't go forwards you go backwards". I'm not sure if this applies... At least we're not shrinking anymore.

And I'm always unsure if the tone and atmosphere here changes subtly and gradually. I've always disagreed with a few dynamics here. But lately it feels like we're on the decline, at least to me. I occasionally keep an eye on the votes on my comments. And seems I'm getting fewer of them. Sometimes I reply to a post and not a single person interacts. Even OP seems to have abandoned their post moments after writing it. And also for nuanced and longer replies, I regularly don't get more than one or two upvotes. I think that used to be a bit better at some point. And I see the same thing happening with other peoples' comments. So it's not just me writing low-quality comments. What does work is stating simple truths. I regularly get some incoming votes with those. But my vision of this place isn't spreading simple truths, but have proper and meaningful discussions, learn things and new perspectives or just mingle with people or talk. But judging by the votes I observe, that isn't appreciated by the community here.

Another pet peeve of mine is the link aggregator aspect of Lemmy. I'd say at least 80% of Lemmy is about dumping some political (or tech) news articles. Lots of them don't generate any engagement. Lots of them are really low-effort. OP just dumps something somewhere, no body text added, no info about what's interesting about it. And people don't even read those articles. They just read the title and react (emotionally) to that. In the end probably neither OP nor the audience read the article and it's just littering the place. Burying and diminishing other, meaningful content. (With that said: There are also nice (news) discussions going on at the same time. And Lemmy is meant to be a link aggregator. It's just that my perception is: it's skewed towards low quality, low engagement and random noise.)

A few people here also don't really like political debate. And there's no escape from it here on Lemmy since so much revolves around that. And nowadays politics is about strong opinions, emotions and emotional reactions. And often limited to that. The dynamics of Lemmy reinforce the negative aspect of that, because the time when you're most incentivized to reply or react is, when it triggers some strong emotion in you, for example you strongly disagree with a comment and that makes you want to counter it and write your own opinion underneath. If you agree, you don't feel a strong emotion and you don't reply. And the majority of users seems to also forget to upvote in that case, as I lined out earlier. And we also don't write nuanced answers, dissect complex things and examine it from all angles. That's just effort and it's not as rewarding for the brain to do that as it is pointing out that someone is wrong. So it just fosters an atmosphere of being argumentative.

Prospect

I think we have several ways of steering the community:

Technology: Features in the software, design choices that foster good behavior.
Moderation: Give toxic people the boot, or delete content that drags down the place. Following: What remains is nice people and not adverse content.
The community

I'd say 1 and 2 go without saying. (Not that everything is perfect with those...) But it really boils down to 3: The community. This is a fairly participatory place. We are the ones shaping the tone and atmosphere. And it's our place. It's kind of our obligation to care for it if we want to see it go somewhere. Isn't it?

So what's your vision of this place? Do you have some idea on where you'd like it to go? Practical ideas on how to achieve it?
Do you even agree with my perception of the dynamics here, and the implications and conclusions I came up with?