I see no one else commented my stack, so I suggest:
Nomad for managing containers if you want something high availability. Essentially the same as k8s but much much much simpler to deploy, learn, and maintain. Perfect for homelabs imo. Most of the concepts of Nomad translate well to k8s if you do want to learn it later. It integrates really well with Terraform too if you are also hoping to learn that, but it’s not a requirement.
NixOS for managing the bare metal. It’s a lot more work to learn than say, Debian, but it is just as stable, and all configuration will be defined as code, down to the bootloader config (no bash scripts!). This makes it super robust. You can also deploy it remotely. Once you grow beyond a handful of nodes it’s important to use a config management tool, and Nix has been by far my favourite so far.
If you really want everything to be infra-as-code, you can manage cloud providers via Terraform too.
For networking I use wireguard, and configure it with NixOS. Specifically, I have a mesh network where every node can reach every node without extra hops. This is a requirement if you don’t want a single point of failure (hub and spoke) to disconnect your entire cluster.
Everything in my setup is defined ‘as-code’, immutable, and multi-node (I have 7 machines) which seems to be what you want, from what you say in your post. I’ll leave my repo here, and I’m happy to answer questions!
–
My opinions on the alternatives:
Docker compose is great but doesn’t scale if you want high availability (ie, have a container be rescheduled on node failure). If you don’t want higher availability, anything more than docker might be overkill.
Ansible and Puppet are alright but are super stateful, and require scripting. If you want immutability you will love Nix/NixOS
k8s works (I use it at work) but is extremely hard to get right, even for well-resourced infra teams. Nomad achieves the same but with the leanings of having come afterwards, and without the history.
I recommend starting with ZeroToNix’s docs and then moving on to nixos.wiki, but here is a minimal, working example that I could deploy to a hetzner VPS that only has nix and ssh installed:
This sets up a machine, configures the usual stuff like the ssh daemon, creates a user, and sets up an nginx server. To deploy it you would run
nixos-rebuild --target-host root@10.0.0.1 switch
. Other tools exist (I use colmena but the idea is the same). Note how easy it was to set up nginx! If I was setting Nomad up, I would just doservices.nomad.enable = true
.As you can see some things you will have to learn (the nix language, what the configs are…) but I think it is worth it.