Local And Distributed AI Redux (Proxmox)

Continued from Proxmox Symposium - Hybrid Cloud LOCAL AI - LLM + SDXL + LXC Containers + Kubernetes Fabric for AMD GPU's

In loving memory of the founder of Proxmox Helper Scripts, tteck/tteckster.

Why is Local and TRULY distributed AI SO IMPORTANT? Glad you asked… Bill? Take it away…

If it was not clear, the importance of TTeck’s work was and will be profound in this space. Scripts ARE the future of TRULY distributed. Remain vigilant against those whom we are developing away from.

SAFETY NOTICE: I wouldn’t personally recommend playing around with MESH wifi. One instance is probably fine, but clusters… who knows? Be careful out there. God bless every one of you. LONG LIVE DISTRIBUTED TECH!

Now, these virtualizations are not necessary, but I do consider the creature comforts of a good user experience to be just as, if not more important than our actual LLM. After all, if it’s not fun, why bother? :slight_smile:

Some Great 2.5G and/or 10G Switches - Nice for Ceph

You’re gonna like the way you network, I guarantee it…

https://www.amazon.com/Unmanaged-VIMIN-2-5Gbase-T-10G-SFP/dp/B0DNDS2NRJ

https://www.amazon.com/dp/B0CT2F3ZDM

https://www.amazon.com/Managed-8X10G-SFP-Aggregation-Multi-gig/dp/B0CQJCQ17Q

https://www.amazon.com/s?k=10g+sfp+switch&s=price-asc-rank&qid=1737841282

Ethernet over power!

Power filtering and data? Power filter, why yes, thanks for considering my health, and data, well sure, for a limited number of nodes until tech improves. Speaking, have you heard of POE? Or was it called EoP, which do you like to think will win the arms race?

https://www.amazon.com/gp/product/B01H74VKZU/

Mmm single cable nodes, so clean and so fresh…
https://www.amazon.com/BV-Tech-Switch-Gigabit-Ethernet-uplink/dp/B01MQHD54L/

And nobody seems to be talking about porting pci express x16, why is that? Connecting motherboards with a pci e x16 tap? Insane in the membrane…

https://www.amazon.com/lilila-ree-Graphics-Extension-90-Degree/dp/B0BNBTTPKC

Oopsie poopsie, the cat is out of the bag…

LVM vs. LVM-Thin vs. ZFS vs. RBD Ceph

Differentiate Ceph Pools for SSD and HDD

Extra useful commands to resolve errors presented by Ceph summary page?

ceph osd pool application enable ssdpool rbd
ceph osd pool application enable hddpool rbd

… then navigate to Datacenter->Storage->Add->RBD for each of the two newly created pools to establish our RBD’s which should now be recognized across our cluster.

Now I don’t know if it is procedurally correct, but you can now add a Cephfs for storing ISO’s by clicking a given PVE->Ceph->CephFS->Create CephFS.

You can also navigate to a given PVE->Ceph->Pools->ssdpool->edit-> and reduce size using the toggle from 3 to 2 to increase disk capacity at the expense of redundancy or inversely increase from 3 to 4 to decrease disk capacity at the benefit of redundancy, depending upon your application.

I’ve heard it said that 3 is 99.95% reliable, while 2 is 98.5% reliable, all else equal. And I’ve heard it said that 2 is underrated in non-production environments as you can achieve a level of error correction with only 2 if you have BlueStore which utilizes checksum -
https://www.reddit.com/r/ceph/comments/zkksud/replica_3_vs_replica_2/

Now then, from personal experience I can tell you that if you only have three nodes and one of them goes down, your containers and VM’s will crawl. So while my data survived a bad memory stick I had, for example, I’m personally going to stick with 3/2 for non-production not only to make sure I don’t lose data but so that my HA (Highly Available) instances run strong even when an entire node goes down.

Starting Over Orphaned CEPH

Wiping an orphaned CEPH Disk:

List Disks (in hard drive’s host shell)

lsblk

This also provides valuable insights

fdisk -l

Use this to obtain drive path for the following command:

fdisk /dev/REPLACEWITHDISK

Delete a partition:

d

and/or

Create a new partition:

n

Now you will be prompted for things like your starting block (probably default enter), ending block (probably default enter), one other thing just follow directions carefully in the text wizard.

Save your work (PERMANENT!!):

w

Now you can navigate to Disks → Select Disk → Wipe Disk after making sure you’re in the correct node’s control panel. Don’t forget to create something with your new and cleaned partition!

M.2 NVMe - Those fast little guys

List Drives

lsblk

Use this to obtain what ceph name you need for:

dmsetup remove ceph-REALLY-LONG-NAME-OF-OLD-CEPH-PARTITION-FROM-LSBLK

Now you are free to navigate to Disks → select disk → Wipe disk. Don’t forget to do something with your shiny new Drive!

Deleting a Cephfs (Ceph filesystem):

*replace NAME with your Cephfs’s name

You can try if you like to remove, it may be protected both from this command and in the UI (User Interface - the graphical website)…

ceph fs rm NAME --yes-i-really-mean-it

WARNING: DELETES THINGS, USE AT YOUR OWN RISK

… Ergo, you probably need to remove protections:

pveceph stop --service mds.NAME
ceph fs set NAME down true
ceph fs fail NAME
ceph fs set NAME joinable false
ceph fs rm NAME --yes-i-really-mean-it

A more sanitized approach (necessary to delete underlying data also):

umount /mnt/pve/NAME
pveceph stop --service mds.NAME
pveceph mds destroy NAME
pveceph fs destroy NAME --remove-storages --remove-pools

If applicable, navigate to Datacenter → Storage → Select the Cephfs → Remove to delete from User Interface (UI)

Now you can navigate to the applicable pools / metadata on the UI-side and delete them if they’re still haunting the UI. If applicable, navigate to NODE → Ceph → Pools → Select the corresponding _data and _metadata → Destroy

Still having problems? Have you tried turning it off and on again?

reboot

https://docs.ceph.com/en/latest/cephfs/administration/
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_destroy_cephfs

How about a more powerful version of CasaOS for you with Local AI support? Nice…

ZIMA FOR PROXMOX:

There must be a zillion alternatives out there right now…

Is Deepseek compromised? Probably. Is it faster than ChatGPT? Probably. Keep looking for better and better LLM’s with less and less capturability, I always said…

This is THE source! https://huggingface.co/

What about Upstream DNS? Virtual routers, anyone?

Pihole (basic DNS and DHCP, no upstream DNS) - Installing Pi-Hole on Proxmox – Natural Born Coder

Cloudflared + Pihole to dodge that nosy ISP - https://youtu.be/OfcuP01JyOE?si=TRSEbssf6j-MdzBe
… but not that nosy Cloudflare

Local Recursive DNS? Why yes - https://www.crosstalksolutions.com/the-worlds-greatest-pi-hole-and-unbound-tutorial-2023/#Unbound_Setup

Even MOAR Privacy -

MAXIMUIM Privacy - I couldn’t find a good TOR + Pihole + Unbound + DNSECC DNS LXC I could trust, so I built my own


Stand-alone Router w/ Native Wireless Support: Docker support - RaspAP Documentation

LXC > Docker (Router): Install Pi-hole on Proxmox and Use OPNsense Unbound DNS as Upstream DNS

I’ll just leave this here for now…

Wonder how many Tiny Core Linux instances a decent homelab could run? Hundreds? Thousands?!


.