LXC Graphics (GPU) Passthrough for Open WebUI

Josef_Founder · February 19, 2025, 9:55pm

The following is intended as a supplement to this link, which should be construed as a prerequisite, all else equal: The Best Way to Get Deepseek-R1: By Using an LXC in Proxmox

I recommend Ubuntu 22.04, Jammy Jelly, for less drama. I used to prefer Debian, but support of repos is falling off a cliff.

gpu.sh

#!/bin/bash

if [ -z "$1" ]; then
    echo "Usage: $0 <LXC_CONTAINER_ID>"
    exit 1
fi

LXC_ID="$1"
LXC_CONF="/etc/pve/lxc/$LXC_ID.conf"
GPU_DEVICES=("card0" "renderD128")  # Adjust based on your GPU setup

echo "Stopping LXC container $LXC_ID..."
pct stop "$LXC_ID"

echo "Updating LXC config at $LXC_CONF..."
{
    echo "unprivileged: 1"
    echo "lxc.apparmor.profile: unconfined"
    echo "lxc.cap.drop: "
    echo "lxc.cgroup.devices.allow: c 226:* rwm"
    echo "lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir"
    echo "lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file"
    echo "#lxc.idmap: u 0 100000 65536"
    echo "#lxc.idmap: g 0 100000 65536"
    echo "#lxc.idmap: u 100000 0 1"
    echo "#lxc.idmap: g 100000 0 1"
    echo "#lxc.idmap: g 100001 44 1"
} >> "$LXC_CONF"

echo "Detecting GPU devices..."
ls -l /dev/dri

echo "Setting up UDEV rules for GPU passthrough..."
cat <<EOF > /etc/udev/rules.d/99-gpu-passthrough.rules
KERNEL=="card0", SUBSYSTEM=="drm", MODE="0660", OWNER="100000", GROUP="100000"
KERNEL=="renderD128", SUBSYSTEM=="drm", MODE="0660", OWNER="100000", GROUP="100000"
EOF

echo "Reloading UDEV rules..."
udevadm control --reload-rules
udevadm trigger

echo "Creating media_group for GPU access..."
groupadd -f media_group
usermod -aG media_group root

echo "Creating GPU permission fix script at /usr/local/bin/gpu_permission_fix.sh..."
cat <<EOF > /usr/local/bin/gpu_permission_fix.sh
#!/bin/bash
chown root:media_group /dev/dri/renderD128
chmod 660 /dev/dri/renderD128
EOF
chmod +x /usr/local/bin/gpu_permission_fix.sh

echo "Creating systemd service for automatic GPU permission fix..."
cat <<EOF > /etc/systemd/system/gpu_permission_fix.service
[Unit]
Description=Run GPU permission fix at startup
After=network.target

[Service]
ExecStart=/usr/local/bin/gpu_permission_fix.sh
Restart=no

[Install]
WantedBy=multi-user.target
EOF

systemctl enable gpu_permission_fix.service
systemctl start gpu_permission_fix.service

echo "Starting LXC container $LXC_ID..."
pct start "$LXC_ID"

echo "Applying GPU group inside the container..."
pct exec "$LXC_ID" -- bash -c "
    groupadd -f media_group
    usermod -aG media_group root
    chown root:media_group /dev/dri/renderD128
    chmod 660 /dev/dri/renderD128
"

echo "Verifying GPU access inside the container..."
pct exec "$LXC_ID" -- ls -l /dev/dri

echo "GPU passthrough setup complete! 🎉"

chmod +x gpu.sh

sudo deploy gpu.sh <CONTAINER#>

It looks like the GPU devices inside the LXC container are owned by nobody:nogroup, and the script is failing to change ownership and permissions due to restrictions in an unprivileged LXC container.

Fix: Use `lxc.idmap` to Remap GPU Device Ownership

Since the container is unprivileged, the root inside the container is mapped to an unprivileged user on the Proxmox host, preventing permission changes. We need to explicitly map the GPU devices to the container’s user namespace.

Step 1: Modify the LXC Configuration

Edit /etc/pve/lxc/100.conf and add the following lines, IF the script failed to do so AND it’s not working:

unprivileged: 1
lxc.apparmor.profile: unconfined
lxc.cap.drop:
lxc.cgroup.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
# lxc.idmap: u 0 100000 65536
# lxc.idmap: g 0 100000 65536
# lxc.idmap: u 100000 0 1
# lxc.idmap: g 100000 0 1
# lxc.idmap: g 100001 44 1

This ensures that inside the container:

root maps to an actual privileged root.
The media_group inside the container can access the GPU.

Step 2: Create a Media Group and Assign GPU Access

On the Proxmox Host, run:

groupadd -g 44 media_group
chown root:media_group /dev/dri/renderD128
chmod 660 /dev/dri/renderD128

This ensures the container’s group 44 (media_group) has access to the GPU.

Step 3: Apply Fix Inside the Container

Start the container:

pct start 100

Then, inside the container (pct exec 100 -- bash), run:

groupadd -g 44 media_group
usermod -aG media_group root
chown root:media_group /dev/dri/renderD128
chmod 660 /dev/dri/renderD128

This will apply the correct permissions.

Step 4: Verify GPU Access

Inside the container, check:

ls -l /dev/dri

It should now show something like:

crw-rw---- 1 root media_group 226, 128 Feb 19 10:55 renderD128

This should fix the permission errors.

Josef_Founder · February 19, 2025, 10:05pm

Debugging at this Milestone

The error “Failed to run mount hooks” suggests that the container’s mount entry for /dev/dri is invalid. Let’s debug and fix it.

Step 1: Remove Invalid Entries in `100.conf`

Run:

nano /etc/pve/lxc/100.conf

Look for any existing lxc.mount.entry: lines related to /dev/dri and remove them.

Then, replace them with:

lxc.cgroup.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,create=dir,optional

Save and exit.

Step 2: Ensure `/dev/dri` Exists on the Host

On Proxmox Host, check if /dev/dri exists:

ls -l /dev/dri

If it does not exist, your GPU driver might not be loaded. Try:

modprobe i915   # For Intel GPUs
modprobe amdgpu # For AMD GPUs

Then check again:

ls -l /dev/dri

Step 3: Restart and Debug LXC

Try starting the container again:

pct start 100

If it fails, check logs:

journalctl -xe | grep lxc

or

lxc-start -n 100 -F -l DEBUG -o /tmp/lxc-100.log

Then, share any errors you find.

Josef_Founder · February 19, 2025, 10:12pm

How do I install the drivers inside the LXC using this username?

Step 1: Check GPU Access Inside the LXC

Run this inside your container:

ls -l /dev/dri

You should see the devices, and they should now be accessible by your container’s media_group.

Step 2: Identify Your GPU Type

Inside the LXC container, run:

lspci -nn | grep VGA

or

ls /dev/dri

This helps determine if your GPU is properly passed through.

Step 3: Install Drivers

Depending on your GPU type, follow one of the driver installation methods below:

For Intel GPUs (iGPU)

apt update && apt install -y intel-media-va-driver vainfo mesa-utils

Verify installation:

vainfo
glxinfo | grep "OpenGL renderer"

For AMD GPUs

apt update && apt install -y mesa-utils firmware-amd-graphics vainfo

Verify:

glxinfo | grep "OpenGL renderer"
vainfo

For NVIDIA GPUs (If You’re Using a Full GPU Passthrough)

Enable non-free repositories

echo "deb http://deb.debian.org/debian $(lsb_release -cs) main contrib non-free non-free-firmware" | tee -a /etc/apt/sources.list
apt update

Install the NVIDIA Driver

apt install -y nvidia-driver nvidia-utils nvidia-smi

Check NVIDIA GPU

nvidia-smi

or

nvidia-settings

Step 4: Add Your User to the GPU Group

If GPU access is still blocked inside the container, add your container user to the media_group inside the LXC:

usermod -aG media_group $USER

Then, log out and back in.

Step 5: Test GPU Acceleration

Run:

glxgears

or

vainfo

If everything works, your container should now have full GPU acceleration!

Josef_Founder · February 19, 2025, 10:21pm

Version 2.0

Final `gpu2.sh` Script

#!/bin/bash

# Ensure a container ID is provided
if [ -z "$1" ]; then
    echo "Usage: $0 <LXC_ID>"
    exit 1
fi

LXC_ID=$1
LXC_CONF="/etc/pve/lxc/$LXC_ID.conf"
GPU_GROUP="media_group"
GPU_GID=44  # Fixed GID for consistency

echo "Stopping LXC container $LXC_ID..."
pct stop $LXC_ID

echo "Detecting GPU devices..."
GPU_DEVICES=$(ls -1 /dev/dri | grep -E 'card[0-9]+|renderD[0-9]+')

if [ -z "$GPU_DEVICES" ]; then
    echo "No GPU devices found!"
    exit 1
fi

# Ensure the media_group exists on the host
if ! getent group $GPU_GROUP >/dev/null; then
    echo "Creating group '$GPU_GROUP' with GID $GPU_GID..."
    groupadd -g $GPU_GID $GPU_GROUP
fi

# Update GPU device permissions on the host
echo "Setting permissions for GPU devices..."
for DEVICE in $GPU_DEVICES; do
    chown root:$GPU_GROUP /dev/dri/$DEVICE
    chmod 660 /dev/dri/$DEVICE
done

# Backup LXC config
cp "$LXC_CONF" "$LXC_CONF.bak"

# Define required config lines
CONFIG_LINES=(
    "arch: amd64"
    "cores: 8"
    "hostname: openwebui"
    "memory: 16000"
    "onboot: 1"
    "ostype: debian"
    "rootfs: CephPool:vm-$LXC_ID-disk-0,size=100G"
    "swap: 512"
    "tags: proxmox-helper-scripts"
    "unprivileged: 0"
    "lxc.apparmor.profile: unconfined"
    "lxc.cap.drop:"
    "lxc.cgroup.devices.allow: c 226:* rwm"
    "lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir"
    "lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file"
    "#lxc.idmap: u 0 100000 65536"
    "#lxc.idmap: g 0 100000 65536"
    "#lxc.idmap: u 100000 0 1"
    "#lxc.idmap: g 100000 0 1"
    "#lxc.idmap: g 100001 44 1"
)

# Append missing config lines
echo "Ensuring LXC config is up-to-date..."
for LINE in "${CONFIG_LINES[@]}"; do
    grep -qF -- "$LINE" "$LXC_CONF" || echo "$LINE" >> "$LXC_CONF"
done

# Start the container
echo "Starting LXC container $LXC_ID..."
pct start $LXC_ID

# Apply fixes inside the container
echo "Applying GPU group inside the container..."
pct exec $LXC_ID -- bash -c "
    groupadd -g $GPU_GID $GPU_GROUP || true
    usermod -aG $GPU_GROUP root
    for DEVICE in $GPU_DEVICES; do
        chown root:$GPU_GROUP /dev/dri/\$DEVICE
        chmod 660 /dev/dri/\$DEVICE
    done
"

# Verify GPU access inside the container
echo "Verifying GPU access inside the container..."
pct exec $LXC_ID -- ls -l /dev/dri

echo "GPU passthrough setup complete! 🎉"

What This Script Does

Stops the LXC container before making changes.
Detects and configures GPU devices automatically.
Ensures the media_group exists (GID 44) for GPU access.
Assigns correct permissions for /dev/dri/* devices on the host.
Appends the necessary LXC config lines if missing.
Starts the container after applying changes.
Configures GPU group inside the container.
Verifies GPU access inside the container.

How to Use

sudo ./gpu2.sh 100

This will ensure full GPU passthrough with automatic device detection and correct permissions while injecting the template into your LXC config.

Now fully automated for Proxmox! Let me know if you need any tweaks!

Josef_Founder · February 19, 2025, 11:13pm

Version 3.0 is EXPERIMENTAL and INCOMPLETE
I need to fix the driver installation inside of container, but for now you can do this manually as the script works up to this point fine. I also need to remove privileged from the container, and replace it with a user for security and proper containerization.

This Script Attempts to be a More Robust All-In-One Installer:

Stops the target LXC container.
Detects GPU devices (/dev/dri entries) and assigns permissions.
Ensures the GPU group exists on the host and inside the container.
Updates the LXC configuration file to allow GPU passthrough.
Restarts the container and applies GPU-related fixes inside it.
Installs necessary system dependencies for GPU usage.
Detects the GPU type and installs the corresponding drivers (NVIDIA, AMD ROCm, or Intel OpenVINO).
Clones or updates Open WebUI inside the container.
Configures GPU-related environment variables dynamically.
Injects GPU settings into Open WebUI.
Runs Open WebUI with GPU acceleration.
Verifies GPU utilization inside the container.

This integrates GPU passthrough into an LXC container while automatically setting up Open WebUI with GPU support.

#!/bin/bash

# Ensure a container ID is provided
if [ -z "$1" ]; then
    echo "Usage: $0 <LXC_ID>"
    exit 1
fi

LXC_ID=$1
LXC_CONF="/etc/pve/lxc/$LXC_ID.conf"
GPU_GROUP="media_group"
GPU_GID=44  # Fixed GID for consistency

echo "Stopping LXC container $LXC_ID..."
pct stop $LXC_ID

echo "Detecting GPU devices on the host..."
GPU_DEVICES=$(ls -1 /dev/dri | grep -E 'card[0-9]+|renderD[0-9]+')

if [ -z "$GPU_DEVICES" ]; then
    echo "No GPU devices found!"
    exit 1
fi

# Ensure the media_group exists on the host
if ! getent group $GPU_GROUP >/dev/null; then
    echo "Creating group '$GPU_GROUP' with GID $GPU_GID..."
    groupadd -g $GPU_GID $GPU_GROUP
fi

# Update GPU device permissions on the host
echo "Setting permissions for GPU devices..."
for DEVICE in $GPU_DEVICES; do
    chown root:$GPU_GROUP /dev/dri/$DEVICE
    chmod 660 /dev/dri/$DEVICE
done

# Backup LXC config
cp "$LXC_CONF" "$LXC_CONF.bak"

# Append missing GPU passthrough settings **without modifying existing lines**
echo "Appending necessary GPU passthrough settings..."
{
    grep -qF "lxc.cgroup.devices.allow: c 226:* rwm" "$LXC_CONF" || echo "lxc.cgroup.devices.allow: c 226:* rwm" >> "$LXC_CONF"
    grep -qF "lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir" "$LXC_CONF" || echo "lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir" >> "$LXC_CONF"
    grep -qF "lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file" "$LXC_CONF" || echo "lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file" >> "$LXC_CONF"
} && echo "GPU passthrough settings appended."

# Start the container
echo "Starting LXC container $LXC_ID..."
pct start $LXC_ID

# Apply fixes inside the container
echo "Applying GPU group inside the container..."
pct exec $LXC_ID -- bash -c "
    groupadd -g $GPU_GID $GPU_GROUP || true
    usermod -aG $GPU_GROUP root
    for DEVICE in \$(ls /dev/dri | grep -E 'card[0-9]+|renderD[0-9]+'); do
        chown root:$GPU_GROUP /dev/dri/\$DEVICE
        chmod 660 /dev/dri/\$DEVICE
    done
"

# Install necessary packages inside LXC
echo "Installing system dependencies inside container $LXC_ID..."
pct exec $LXC_ID -- bash -c "
    apt update
    apt install -y git python3 python3-pip wget clinfo vainfo pciutils
"

# Detect GPU type inside LXC using `vainfo` and `/dev/dri`
echo "Detecting GPU type inside container..."
GPU_TYPES=()

if pct exec $LXC_ID -- vainfo | grep -i "nvidia"; then
    GPU_TYPES+=("nvidia")
    echo "Installing NVIDIA drivers..."
    pct exec $LXC_ID -- bash -c "
        apt install -y nvidia-driver nvidia-cuda-toolkit
        pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    "
fi

if pct exec $LXC_ID -- vainfo | grep -i "AMD"; then
    GPU_TYPES+=("amd")
    echo "Installing AMD ROCm drivers..."
    pct exec $LXC_ID -- bash -c "
        apt install -y rocm-dev
        pip3 install torch torchvision torchaudio rocm
    "
fi

if pct exec $LXC_ID -- vainfo | grep -i "Intel"; then
    GPU_TYPES+=("intel")
    echo "Installing Intel OpenVINO drivers..."
    pct exec $LXC_ID -- bash -c "
        apt install -y intel-media-va-driver
        pip3 install openvino
    "
fi

# Clone Open WebUI inside LXC
echo "Cloning Open WebUI inside container..."
pct exec $LXC_ID -- bash -c "
    if [ -d 'open-webui' ]; then
        echo 'Updating existing open-webui repository...'
        cd open-webui
        git fetch --all
        git reset --hard origin/main
        git pull
    else
        echo 'Cloning open-webui repository...'
        git clone --depth 1 --branch main https://github.com/open-webui/open-webui.git
        cd open-webui
    fi

    # Install Python dependencies
    if [ -f 'requirements.txt' ]; then
        pip3 install -r requirements.txt
    else
        pip3 install fastapi uvicorn python-multipart jinja2 aiofiles
    fi
"

# Configure GPU environment variables inside the container
echo "Configuring GPU environment variables..."
pct exec $LXC_ID -- bash -c "
    cat <<EOL > gpu_config.py
import os

if 'nvidia' in os.environ.get('GPU_TYPES', '').split(','):
    os.environ['CUDA_VISIBLE_DEVICES'] = 'all'

os.environ['GPU_TYPES'] = '${GPU_TYPES[*]}'

if 'amd' in os.environ.get('GPU_TYPES', '').split(','):
    os.environ['ROCM_PATH'] = '/opt/rocm'
    
if 'intel' in os.environ.get('GPU_TYPES', '').split(','):
    os.environ['INTEL_OPENVINO_DIR'] = '/opt/intel/openvino'

EOL
"

# Inject GPU settings into Open WebUI
echo "Injecting GPU settings into Open WebUI..."
pct exec $LXC_ID -- bash -c "
    if [ -f 'open-webui/run.py' ]; then
        sed -i '/import os/a import gpu_config' open-webui/run.py
    fi
"

# Run Open WebUI
echo "Starting Open WebUI with GPU support inside container..."
pct exec $LXC_ID -- bash -c "
    cd open-webui
    python3 run.py
"

# Verify GPU usage inside the container
echo "Verifying GPU usage inside the container..."
for gpu_type in "${GPU_TYPES[@]}"; do
    case $gpu_type in
        "nvidia")
            pct exec $LXC_ID -- nvidia-smi || echo "NVIDIA GPU detected but 'nvidia-smi' failed to run."
            ;;
        "amd")
            pct exec $LXC_ID -- clinfo || echo "AMD GPU detected but 'clinfo' failed to run."
            ;;
        "intel")
            pct exec $LXC_ID -- vainfo || echo "Intel GPU detected but 'vainfo' failed to run."
            ;;
    esac
done

echo "GPU passthrough & driver setup inside LXC complete! 🎉"