29

[Guide] Arch (btw), ROCm (AMD), docker (podman), llama.cpp (server) setup (feddit.it)

submitted 2 months ago by Mechanize@feddit.it to c/localllama@sh.itjust.works

5 comments fedilink

Some days ago ROCm 6.4 was officially added to the Arch repositories - which is great - but it made my current setup completely explode - which is less great - and currently I don't have the necessary will to go and come back from gdb hell...

So I've taken this opportunity to set up a podman (docker alternative) container to use the older, and for me working, ROCm 6.3.3. On the plus side this has made it even easier to test new things and do random stuff: I will probably port my Vulkan setup too, at a later date.

Long story short I've decided to clean it up a bit, place a bunch of links and comments, and share it with you all in the hope it will help someone out.

You still need to handle the necessary requirements on your host system to make everything work, but I've complete trust in you! Even if it doesn't work, it is a starting point that I hope will give some direction on what to do.

BTW I'm not an expert in this field, so some things can be undoubtedly improved.

Assumptions

To make this simpler I will consider, and advice to use, this kind of folder structure:

base_dir
 ├─ROCm_debian_dev
 │  └─ Dockerfile
 └─llamacpp_rocm6.33
    ├─ logs
    │   └─ logfile.log
    ├─ workdir
    │   └─ entrypoint.sh
    ├─ Dockerfile
    └─ compose.yaml

I've tested this on Arch Linux. You can probably make it work on basically any current, and not too old distro, but it's untested.
You should follow the basic requirements from the AMD documentation, and cross your fingers. You can probably find a more precise guide on your distro wiki. Or just install any and all ROCm and HIP related SDKs. Sigh.
I'm using podman, which is an alternative to docker. It has some idiosyncrasies - which I will not get into because they would require another full write-up, so if you use docker it is possible you'll need to modify some things. I can't help you there.
This is given with no warranty: if your computer catches on fire, it is on you (code MIT/Apache 2 license, the one you prefer; text CC BY-SA 4.0). More at the end.
You should know what 'generation' of card yours is. ROCm works in mysterious ways and each card has its problems. Generally you can just steam roll forward, with no care, but you still need to find which HSA_OVERRIDE_GFX_VERSION your card needs to run under. For example for a rx6600xt/rx6650xt it would be gfx1030 and HSA_OVERRIDE_GFX_VERSION=10.3.0. Some info here: Compatibility Matrix You can (not so) easily search for the correct gfx and HSA codes on the web. I don't think the 9xxx series is currently supported, but I could be wrong.
There's an official Docker image in the llama.cpp repository, you could give that one a go. Personally I like doing them myself, so I understand what is going on when I inevitably bleed on the edge - in fact I didn't even consider the existence of an official Dockerfile until after writing this post.. Whelp. Still, they are two different approaches, pick your poison.

Dockerfile(s)

These can, at the higher level, be described as the recipe with which we will set up the virtual machine that will compile and run llama.cpp for us.

I will put here two Dockerfile, one can be used as a fixed base, while the second one can be re-built everytime you want to update llama.cpp.

Now, this will create a new container each time, we could use a volume (like a virtual directory shared between the host machine and the container) to just git pull the new code instead of cloning, but that would almost completely disregard the pro of running this in a container. TLDR: For now don't overthink it and go with the flow.

Base image

This is a pretty basic recipe, it gets the official dev-ubuntu image by AMD and then augment it to be suitable for our needs: you can easily use other versions of ROCm (for example dev-ubuntu-24.04:6.4-complete) or even ubuntu. You can find the filtered list of the images here: Link

Could we use a lighter image? Yes. Should we? Probably. Maybe next time.

tbh I've tried other images with no success, or they needed too much effort for a minimal reward: this Just Works™. YMMV.

base_dir/ROCm_debian_dev/Dockerfile

# This is the one that currently works for me, you can
# select a different one:
#   https://hub.docker.com/r/rocm/dev-ubuntu-24.04/tags
FROM docker.io/rocm/dev-ubuntu-24.04:6.3.3-complete
# 6.4.0
# FROM docker.io/rocm/dev-ubuntu-24.04:6.4-complete

# We update and then install some stuff.
# In theory we could delete more things to make the final
# image slimmer.
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    cmake \
    libcurl4-openssl-dev \
    && rm -rf /var/lib/apt/lists/*

It is a big image, over 30GB (around 6 to download for 6.3.3-complete and around 4 for 6.4-complete) in size.

Let's build it:

cd base_dir/ROCm_debian_dev/
podman build -t rocm-6.3.3_ubuntu-dev:latest .

This will build it and add it to your local images (you can see them with podman images) with the name rocm-6.3.3_ubuntu-dev and the tag latest. You can change them as you see fit, obviously. You can even give multiple tags to the same image, a common way is to have a more specific tag and then add the tag latest to the last one you have generated, so you don't have to change the other scripts that reference it. More info here: podman tag

The real image

The second image is the one that will handle the llama.cpp[server|bench] compilation and then execution, and you need to customize it:

You should modify the number after the -j based on the number of virtual cores that your CPU has, minus one. You can probably use nproc in a terminal to check for it.
You have to change the AMDGPU_TARGETS code based on your gfx version! pay attention, because the correct one is probably not the one returned by rocminfo, for example the rx6650xt is gfx1032, but that is not directly supported by ROCm. You have to use the supported (and basically identical) gfx1030 instead.

If you want to compile with a ROCm image after 6.3 you need to swap the commented lines. Still, no idea if it works or if it is even supported by llama.cpp.

More info, and some tips, here: Link

base_dir/llamacpp_rocm6.33/Dockerfile

FROM localhost/rocm-6.3.3_ubuntu-dev:latest

# This could be shortened, but I like to have multiple
# steps to make it clear, and show how to achieve
# things in different ways.
WORKDIR /app
RUN git clone https://github.com/ggml-org/llama.cpp.git
WORKDIR /app/llama.cpp
RUN mkdir build_hip
WORKDIR build_hip
# This will run the cmake configuration.
# Pre  6.4 -DAMDGPU_TARGETS=gfx1030
RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S .. -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release
# Post 6.4 -DGPU_TARGETS=gfx1030
# RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S .. -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release
# Here we build the binaries, both for the server and the bench.
RUN cmake --build . --config Release -j7 --target llama-server
RUN cmake --build . --config Release -j7 --target llama-bench

To build this one we will need to use a different command:

cd base_dir/llamacpp_rocm6.33/
podman build --no-cache -t rocm-6.3.3_llamacpp:b1234 .

As you can see we have added the --no-cache long flag, this is to make sure that the image gets compiled, otherwise it would just keep outputting the same image over and over from the cache - because the recipe didn't change. This time the tag is a b1234 placeholder, you should use the current release build number or the current commit short hash of llama.cpp (you can easily find them when you start the bin, or by going on the github page) to remember at which point you have compiled, and use the dynamic latest tag as a supplementary bookmark. The current date is a good candidate too.

If something doesn't feel right - for example your GPU is not running when you make a request to the server - you should try to read the configuration step logs, to see that everything required has been correctly set up and there are no errors.

Let's compose it up

Now that we have two images that have compiled without any kind of error we can use them to reach our goal. I've heavily commented it, so just read and modify it directly. Don't worry too much about all the lines, but if you are curious - and you should - you can easily search for them and find a bunch of explanations that are surely better than what I could write here without occupying too much space.

Being a yaml file - bless the soul of whoever decided that - pay attention to the whitespaces! They matter!

We will use two Volumes, one will point to the folder where you have downloaded your GGUF files. The second one will point to where we have the entrypoint.sh file. We are putting the script into a volume instead of backing it into the container so you can easily modify it, to experiment.

A small image that you could use as a benchmark to see if everything is working is Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf.

base_dir/llamacpp_rocm6.33/compose.yaml

# Benchmark image: https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf
# bechmark command:
#    ./bin/llama-bench -t 7 -m /app/models/Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf -ngl 99 -fa 1 -ctk q4_0 -ctv q4_0
#    ./bin/llama-bench -t 7 -m /app/models/Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf -ngl 99
services:
    llamacpp-server:
        # If you have renamed the image, change it here too!
        image: localhost/rocm-6.3.3_llamacpp:latest
        # The subsequent two lines are needed to enter the image and directly use bash:
        # start it with [podman-compose up -d|docker compose up -d]
        # and then docker attach to the container with
        # [podman|docker] attach ID
        # You'll need to change the entrypoint.sh file too, just with the
        # shebang and a line straight up calling `bash`, as content.
        stdin_open: true
        tty: true
        # end bash section, Comment those two lines if you don't need shell
        # access. Or leave them.
        group_add:
            # The video group is needed on most distros to access the GPU
            # the render group is not present in some and needed
            # in others. Try it out.
            - "video" # 985 # video group - "render" # 989 # render
        environment:
            # FIXME: Change this with the right one!
            # If you have a wrong one it will _not work_.
            - HSA_OVERRIDE_GFX_VERSION=10.3.0
        devices:
            - /dev/kfd:/dev/kfd
            - /dev/dri:/dev/dri
        cap_add:
            - SYS_PTRACE
        logging:
            # The default logging driver is journald, which I despise
            # because it can pollute it up pretty hard.
            #
            # The none driver will not save the logs anywhere.
            # You can still attach to the container, but you will lose
            # the lines before the attachment.
            # driver: none
            #
            # The json-file option is deprecated, so we will use the
            # k8s-file one.
            # You can use `podman-compose logs -f` to keep tabs, and it will not
            # pollute the system journal.
            # Remember to `podman-compose down` to stop the container.
            # `ctrl+c`ing the logs will do nothing.
            driver: k8s-file
            options:
                max-size: "10m"
                max-file: "3"
                # You should probably use an absolute path.
                # Really.
                path: ./logs/logfile.log
        # This is mostly a fix for how podman net stack works.
        # If you are offline when starting the image it would just not
        # start, erroring out. Making it in host mode solves this
        # but it has other cons.
        # Reading the issue(https://github.com/containers/podman/issues/21896) it is
        # probably fixed, but I still have to test it out.
        # It meanly means that you can't have multiple of this running because they will
        # take the same port. Lucky you you can change the port from the llama-server
        # command in the entrypoint.sh script.
        network_mode: "host"
        ipc: host
        security_opt:
            - seccomp:unconfined
        # These you really need to CHANGE.
        volumes:
            # FIXME: Change these paths! Only the left side before the `:`.
            #        Use absolute paths.
            - /path/on/your/machine/where/the/ggufs/are:/app/models
            - /path/to/rocm6.3.3-llamacpp/workdir:/workdir
        # It doesn't work with podman-compose
        # restart: no
        entrypoint: "/workdir/entrypoint.sh"
        # To make it easy to use I've added a number of env variables
        # with which you can set the llama.cpp command params.
        # More info in the bash script, but they are quite self explanatory.
        command:
            - "${MODEL_FILENAME:-Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf}"
            - "${GPU_LAYERS:-22}"
            - "${CONTEXT_SIZE:-8192}"
            - "${CALL_TYPE:-bench}"
            - "${CPU_THREADS:-7}"

Now that you have meticulously modified the above file let's talk about the script that will launch llama.cpp.

base_dir/llamacpp_rocm6.33/workdir/entrypoint.sh

#!/bin/bash
cd /app/llama.cpp/build_hip || exit 1
MODEL_FILENAME=${1:-"Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf"}
GPU_LAYERS=${2:-"22"}
CONTEXT_SIZE=${3:-"8192"}
CALL_TYPE=${4:-"server"}
CPU_THREADS=${5:-"7"}

if [ "$CALL_TYPE" = "bench" ]; then
  ./bin/llama-bench -t "$CPU_THREADS" -m /app/models/"$MODEL_FILENAME" -ngl "$GPU_LAYERS"
elif [ "$CALL_TYPE" = "fa-bench" ]; then
  ./bin/llama-bench -t "$CPU_THREADS" -m /app/models/"$MODEL_FILENAME" -ngl "$GPU_LAYERS" -fa 1 -ctk q4_0 -ctv q4_0
elif [ "$CALL_TYPE" = "server" ]; then
  ./bin/llama-server -t "$CPU_THREADS" -c "$CONTEXT_SIZE" -m /app/models/"$MODEL_FILENAME" -fa -ngl "$GPU_LAYERS" -ctk q4_0 -ctv q4_0
else
  echo "Valid modalities are \"bench\", \"fa-bench\" or \"server\""
  exit 1
fi

exit 0

This is straightforward. It enters the folder (inside the container) where we built the binary and then calls the right command, decided with an env var. I've set it up to handle some common options, so you don't have to change the script every time you want to run a different model or change the number of layers loaded on VRAM.

The beauty of it is that you could put a .env file in the llamacpp_rocm6.33 folder with the params you want to use, and just start the container.

An example .env file could be:

base_dir/llamacpp_rocm6.33/.env

MODEL_FILENAME=Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf
GPU_LAYERS=99
CONTEXT_SIZE=8192
CALL_TYPE=bench
CPU_THREADS=7

Some notes:

For now it uses flash attention by default with a quantized context. You can avoid this by deleting the -fa and the -ctk q4_0 -ctv q4_0. Experiment around.
You could add more params or environmental variables: it is easy to do. How about one for the port number?
Find more info about llama.cpp server here: Link.
And the bench here: Link.
For now I've set up three commands, one is the server, one is a plain bench and another is a bench with FlashAttention enabled. server, bench, fa-bench.

Time to start it

Starting it is just a command away:

cd base_dir/llamacpp_rocm6.33/
podman-compose up -d
podman-compose logs -f

When everything is completely loaded, open your browser and go to http://127.0.0.1:8080/ to be welcomed by the llama.cpp webui and test if the GPU is being used. (I've my fingers crossed for you!)

Now that everything is working, have fun with your waifus and/or husbandos! ..Sorry, I meant, be productive with your helpful assistant!

When you are done, in the same folder, run podman-compose down to mercilessly kill them off.

Licensing

I know, I know. But better safe than sorry.

All the code, configurations and comments in them not otherwise already under other licenses or under copyright by others, are dual licensed under the MIT and Apache 2 licenses, Copyright 2025 [Mechanize@feddit.it](https://feddit.it/u/Mechanize) . Take your pick.

Itch drama is getting real by Mechanize in c/lemmyshitpost@lemmy.world

[-] Mechanize@feddit.it 151 points 7 months ago

I assumed it was a shitpost, instead it is a real tweet. What a time to be alive.

Jokes aside the only real reason I can fathom for the collectibles company to call their mother is because they had used it as the contact number in the registry. I would be surprised if this was some kind of intimidation tactic instead of just miscommunication - in the sense they probably just wanted to legally intimidate the itch's owner not their immediate family. They are not 2K ^/s^.

718

Last night Organic Maps was removed from the Play Store (feddit.it)

submitted 11 months ago* (last edited 11 months ago) by Mechanize@feddit.it to c/foss@beehaw.org

115 comments fedilink

Last night Organic Maps was removed from the Play Store without any warnings or additional details due to "not meeting the requirements for the Family Program". Compared to Google Maps and other maps apps rated for 3+ age, there are no ads or in-app purchases in Organic Maps. We have asked for an appeal.

As a temporary workaround for the Google Play issue, you can install the new upcoming Google Play update from this link: https://cdn.organicmaps.app/apk/OrganicMaps-24081605-GooglePlay.apk

The Announcement on various Networks: Fosstodon Post
Twitter Post
Telegram Post

If you don't know what Organic Maps is, it is an alternative to OsmAnd and google maps, more info on the official site (link) and GitHub.

Maybe an error? Honestly this is a weird one. I hope we will learn more in the coming hours.

You can still get it on the other channels, like F-Droid or Obtainium. Still, we all know that not being on the Play Store is an heavy sentence for any Android app.

EDITs

Added F-Droid link.
Fixed Typo in the obtainium link.

Woman 'who first shared lies that sparked UK riots' arrested by Mechanize in c/world@lemmy.world

[-] Mechanize@feddit.it 195 points 1 year ago

While she has not been named in the police statement about the arrest, it is believed to be Bonnie Spofforth

This, I don't like. If you - the newspaper, the means of information - are not sure about a name you should really refrain from using it.

It would be not the first time people get their lives ruined by some careless journalist because of a namesake or just an error.

It's not that different from "spreading rumors".

That aside, in this case, it is probably a rumor from an inside source. Still. Not a fan.

Best way to use GOG on linux by Mechanize in c/linux_gaming@lemmy.ml

[-] Mechanize@feddit.it 66 points 1 year ago

The Heroic Games Launcher is (IMHO) by far the best interface to gog you can have on linux.

You can find it on the AUR if you use arch, which makes it pretty straightforward to install.

The next version will integrate with the Galaxy API using the comet project, which should make it even better.

The only problem I had with it is that, once upon a time, there was a bug with downloading some games (Cyberpunk 2077, in my case) and I had to compile the git version of Gog-dl and target that in the settings.. but the fact I could even do that is great by itself.

What actually happened. by Mechanize in c/noncredibledefense@sh.itjust.works

[-] Mechanize@feddit.it 138 points 1 year ago

Red Alert 2 Chronosphere

The Chronosphere was a mass teleportation device developed by the Allies during the Second World War. An improved version was used to decisively end the Third World War and was used further until the end of the Psychic Dominator Disaster. Albert Einstein was a notable contributor to the Chronosphere's design.

Good times.

Prison Architect 2 Has Been Delayed Indefinitely, Pre-Orders Are Being Refunded by Mechanize in c/games@lemmy.world

[-] Mechanize@feddit.it 53 points 1 year ago

The (IMHO) important bits:

TLDR:

Our continuous internal reviews and beta test groups have highlighted areas that we need to focus on more, mainly performance and content

From the FAQ:

Is the game canceled?
No, the game is not canceled.

What happens to pre-orders?
All pre-orders will be refunded in the upcoming weeks. The option to pre-order the game will be removed and the bonus will instead be added to the base game for all

Is there going to be Early Access or Beta Access to the game?
There will not be an early access or extra beta access right now

In the blog there are the steps to how to get the refunds, I'm not copying them in case they change.

As they say, A Delayed Game Is Eventually Good, But a Bad Game Is Bad Forever ^/s?^

My fellow software engineer, It's the year 2024... by Mechanize in c/linux@lemmy.ml

[-] Mechanize@feddit.it 51 points 1 year ago

I wish they used them all, especially XDG_CACHE_HOME which can become pretty big pretty fast.

WhatsApp Chats Will Soon Work With Other Encrypted Messaging Apps by Mechanize in c/technology@lemmy.world

[-] Mechanize@feddit.it 57 points 2 years ago

“One of the core requirements here, and this is really important, is for users for this to be opt-in,” says Brouwer. “I can choose whether or not I want to participate in being open to exchanging messages with third parties. This is important, because it could be a big source of spam and scams.”

Let me translate this for you: "We will make users hop on the most cumbersome, frustrating and inefficient way we can think of to enable interoperability. And making it defaulted to off will mean people using other apps will need to find other channels to ask for it to be enabled on our users' end, making it worthless.

And don't forget: we will put a bunch of scary warnings, and only allow to go all in, with no middle ground or granularity!"

Great stuff, thank you. I can't wait.

“We don't believe interop chats and WhatsApp chats can evolve at the same pace,” he says, claiming it is “harder to evolve an open network” compared to a closed one.

Ah, so they are going for the Apple's approach with iMessage and Android sms. Cool, cool.

I hope my corporate-to-common translator is broken, because this does just sound bad.

*Permanently Deleted* by Mechanize in c/linustechtips@lemmy.ml

[-] Mechanize@feddit.it 64 points 2 years ago

You can find more information in the post "Madison on why she quit" in this community.

To give a really short and unworthy TLDR Madison was an employee of LMG and have shared her - honestly abysmal and abusive - experience working there.
Some noteworthy quotes:

I was asked about my sexual history, my boyfriends sexual history, "how I liked to fuck".

I was told that certain issues were "sexual tension" and I should just "take the co-worker out on a coffee date to ease it out"

I was told I was chunky, fat, ugly, stupid. I was called "retarded" I was called a "faggot"

My work was called "dogs--t" I was called "incompetent".

"I think the reason you try to be funny, is because you lack any other skills." smiled then walked away.

I watched co-workers get what I had asked for weeks before they did. It took 2 months to get mine.

Also apparently some managers didn't like me because I "hadn't gotten drunk with them before" Which was said in that haha just jokin (but actually I'm serious) tone

I sincerely invite you to read the whole thread where she shared her story: https://threadreaderapp.com/thread/1691693740254228741.html

It seems that the Reddit admins are currently deleting all threads that link to Madison's experience.

*Permanently Deleted* by Mechanize in c/linustechtips@lemmy.ml

[-] Mechanize@feddit.it 70 points 2 years ago

This is the work of the Reddit's (site) admins, not the subreddit moderators. In fact it seems it is getting removed site-wide.

*Permanently Deleted* by Mechanize in c/linustechtips@lemmy.ml

[-] Mechanize@feddit.it 232 points 2 years ago* (last edited 2 years ago)

EDIT: It seems the removal got reverted and the post brought back.

It seems Reddit is removing it from everywhere, even the one in pcmasterrace got nuked. It would be interesting to know the reason, but we all know how transparent Reddit's sitewide moderation policies are.

"What do we do now?" - LMG's response to the recent controversies by Mechanize in c/technology@beehaw.org

[-] Mechanize@feddit.it 69 points 2 years ago

I feel that your post is belittling a situation that, as narrated, is straight up mobbing and bullying, only acknowledging it in a small paragraph which I feel boils down to a dismissive "awful but only maybe malicious, probably just lack of oversight", while the rest of your comment tries to find excuses and normalizing something that is not.

These:

I was asked about my sexual history, my boyfriends sexual history, "how I liked to fuck".

I was told that certain issues were "sexual tension" and I should just "take the co-worker out on a coffee date to ease it out"

I was told I was chunky, fat, ugly, stupid. I was called "retarded" I was called a "faggot"

My work was called "dogs--t" I was called "incompetent".

"I think the reason you try to be funny, is because you lack any other skills." smiled then walked away.

I watched co-workers get what I had asked for weeks before they did. It took 2 months to get mine.

Also apparently some managers didn't like me because I "hadn't gotten drunk with them before" Which was said in that haha just jokin (but actually I'm serious) tone

Are nor normal nor acceptable: for anyone who is in a corporation where this is common place: take a step back and understand that it is not healthy for you, bad power dynamics are a real thing and the abuse of them sometimes can feel normal, especially in small businesses that get a sudden explosive growth. And I don't even want to go into her self harming to get a day off.

You can say it was probably a single person, but the lack of action by management with phrases like "change your priorities", "put on your big girl pants" and stuff like that makes it a Company issue, Company which indirectly accept and endorse that kind of treatment: they being so against unionizing sincerely gets a whole other meaning read under this light.

The notebook case is self evidence of it all: A small thing that normally wouldn't be anything important, but compounded with the stressful environment got emotionally distressful. The fact that such a small thing has stayed with her so long should tell you that she was really not in an healthy mental state.

I don't personally care about the whole LTT fiasco, as an uninterested spectator it's fun to watch from the outside and then change channel, a blip in the media world that will most likely blow down in a couple of weeks. But reading how these actions are belittled is really distressing. Bullying is not normal, and it should never be accepted. Ever.

The full thread for whoever missed it: https://threadreaderapp.com/thread/1691693740254228741.html

Madison on why she quit by Mechanize in c/linustechtips@lemmy.ml

[-] Mechanize@feddit.it 90 points 2 years ago

Reading the full thread was stressful enough, I don't even want to image what it was like to live it.

Now I understand why the company is so against unionizing, all this stuff would pretty quickly explode in their face. It's straight up abuse that should be met by the full force of the legal system.

What the fuck is wrong with people that makes them think that if you have the smallest modicum of power over someone else it's your god given right to be an asshole, a sexual predator or just generally toxic.

I don't get it.