Nix Selfhosting in the Age of LLMs

My use case

I recently had to redo my homelab. Homelab might be a very strong word for what is just a Raspberry Pi 4 8GB hooked up to a 2TB storage drive. Regardless, lately it has been slugging along at unacceptable speeds. I don't know if it was system rot or just a buildup of dust, but my home run services were becoming slow. I wanted to run maintenance on it, but I was sort of afraid to touch anything from what I built.

I based all my work on Helm charts to be deployed on Kubernetes (k3s to be exact). I reasoned it was the cutting edge way to self-host, it was highly secure, and that Helm made everything declarative. In the end I found it unwieldy, and the Helm charts weren't as declarative and pure as I would have hoped. The system would still get stuck mid-deployment, previous configuration options wouldn't always go away and services needed to be manually deleted. It always felt like a great beast I was trying to wrestle into submission, which is not how I want to feel about a service backing up pictures of my children.

Starting the migration

It was time for a change. I wanted to restart my homelab and make it something very maintainable. This drove me to use two technologies that were new to me since my last attempt at self hosting:

The NIX ecosystem: I migrated my own laptop to NixOS over a year ago at this point. I migrated my work setup to Nix and now almost all of my projects use Nix to manage dependencies, builds, and deployments. Now, I had an opportunity to manage my homelab using NixOS and declare everything.
LLMs and Agentic Co-Programming: A lot of the work I did in the past involved painstaking port definitions and reverse-proxy configurations. There was no reason to think this time would be different in this regard, but knowing I could partially rely on some form of coding agent made the process significantly less daunting.

Quick success story

I started to make a migration plan - I wanted to keep relevant data from my old homelab and migrate it into my new setup. Within a day I had backups ready to be ported. Within two more days I had my initial services running, albeit with some bugs. It took another day to iron it all out (keep in mind, my time spent working on this project was 10PM to 11PM only, I consider this quick progress).

Sure enough, it all worked! I backed up all my files and ran my little suite of automations within a few days! It was working faster than ever before, and I felt more in control than ever, even though I delegated 50% of the coding work to a machine. I attribute this to the fact that, since everything was strictly declarative, I could always tear it all down and rebuild it exactly the way I want very easily.

Why NIX is so good, even on its own

Fully declarative

The state of my homelab is set directly from my project. With the exception of some service specific settings that can't be set from files, everything can be changed by changing a file in the repo, and everything can be inferred from reading the repo itself. This means I am never in the dark about the state of my homelab, I can just see what should be running.

Saving tools and commands

Something I find myself doing for everyone of my Nix projects is declaring a development shell with special commands for tooling. Every time I encounter myself writing long commands to do specific tasks (checking state of backups remotely, remote build and deploy, running commands inside Podman hosted services), I just write a custom command that gets loaded into the shell.

In the previous version, this was accomplished by writing a bunch of bash files. This does work and achieves the same basic principle, but this feels more approachable, and the commands feel more accessible. I find myself using this tool much more within the NIX ecosystem.

The extra benefit of the LLM

Of course using an agentic coding agent helps with self-hosting related tasks. This is true also for a Kubernetes based helm-charted repo like my previous setup. But there ARE some DEFINITE benefits of using this specific combination.

For starters, the fact that my system is fully declarative means that the LLM knows exactly what is and isn't installed just by looking at the project itself. It doesn't have to hunt around and search for active processes or guess which config file is being read - it knows the file layout and can infer immediately which services are running and how they are configured. Compare this to bloating your context by first reading a bunch of YAML files in the repo and then running a bunch fo commands to make sure everything exists in the repo and making sure it matches the YAML.

An additional benefit came along with all the custom commands I created for myself - turns out if they're useful for me, they're useful for the LLM! This allows it to perform routine tasks repeatably and correctly, instead of guessing what the best way to do so is.

As an example, I already have a command that does all the work for checking the state of my backups. The command queries relevant services and logs, and summarizes it for me in one output. that checks all the relevant services and returns the information together. The LLM, instead of learning about the structure of the backup service every time, can just run the command and look at the output.

Words of caution

Of course LLMs did not solve all my woes. Worse, they introduced new ones.

When setting up the new homelab Claude decided to be pretty lax on security. Network isolation wasn't that important so it could just skip it, file permissions were a bit of a pain to deal with, so why bother… You get the picture.

Needless to say this was not cutting it. Once I realized this was going on, addressing it with the help of Claude was easy, but I had to first see this issues myself. Even with LLMs there is no replacement for understanding the code you have.

Another thing is the actual design of the repo itself and how it is organized. Claude can easily spaghettify a codebase if not kept in check. A disorganized codebase makes it harder for Claude itself to work, but more importantly, it makes it harder for me to understand the code I am using, which makes it harder to control Claude. It was critical for me to babysit Claude and give sensible repo design guidance so my code would be kept in check. Just because you're using AI doesn't mean you should let it make messy code.

Even with these gripes - using an LLM saved me loads of time and allowed me to make this migration even if the time I have to spend on this topic is very, very limited.

Conclusion

This is an ongoing exercise in bad sys-admining - but I am enjoying it very much. Everything so far is running smoothly, and within a short amount of time I surpassed the state my homelab was in - and not once did I have to delete and redeploy an ingress configuration. I even managed to set up services that I gave up deploying before.

I will keep using this for now. In the end, I don't want to have to rely on the LLM for every task I want to do with my homelab. Over time, I will simplify and restructure my codebase further to make it easily maintainable by me, a silly old human. That being said, even now, it is in an excellent state, and I am optimistic about the future of my self-hosted services.