Currently, I run Unraid and have all of my services’ setup there as docker containers. While this is nice and easy to setup initially, it has some major downsides:

  • It’s fragile. Unraid is prone to bugs/crashes with docker that take down my containers. It’s also not resilient so when things break I have to log in and fiddle.
  • It’s mutable. I can’t use any infrastructure-as-code tools like terraform, and configuration sort of just exist in the UI. I can’t really roll back or recover easily.
  • It’s single-node. Everything is tied to my one big server that runs the NAS, but I’d rather have the NAS as a separate fairly low-power appliance and then have a separate machine to handle things like VMs and containers.

So I’m looking ahead and thinking about what the next iteration of my homelab will look like. While I like unraid for the storage stuff, I’m a little tired of wrangling it into a container orchestrator and hypervisor, and I think this year I’ll split that job out to a dedicated machine. I’m comfortable with, and in fact prefer, IaC over fancy UIs and so would love to be able to use terraform or Pulumi or something like that. I would prefer something multi-node, as I want to be able to tie multiple machines together. And I want something that is fault-tolerant, as I host services for friends and family that currently require a lot of manual intervention to fix when they go down.

So the question is: how do you all do this? Kubernetes, docker-compose, Hashicorp Nomad? Do you run k3s, Harvester, or what? I’d love to get an idea of what people are doing and why, so I can get some ideas as to what I might do.

  • seang96@spgrn.com
    link
    fedilink
    English
    arrow-up
    3
    ·
    11 months ago

    I agree with this, though it’s not the easiest option. Ceph is amazing for storage, I am using mini pc nucs for my cluster and to expand storage I simply plan to add nodes and get the largest sata and m.2 SSD.

    I recently setup velero for automated backups. It even is configured to dump my HA postgrrd DB and then backup the dump folder.

    Other cool experiences I can think of is netdata with Prometheus metrics auto scraping pods by their annotations to pull their statistics in, and nothing is more satisfying then making a service highly available.