87 lines
3.4 KiB
Markdown
87 lines
3.4 KiB
Markdown
|
|
# nix-builder-autoscaler
|
||
|
|
|
||
|
|
`nix-builder-autoscaler` provides elastic Nix remote builder capacity for
|
||
|
|
[Guardian Project](https://guardianproject.info) jobs using [buildbot-nix](https://github.com/nix-community/buildbot-nix).
|
||
|
|
|
||
|
|
The reason this was created is because we don't have the budget for dedicated always-on build hardware.
|
||
|
|
|
||
|
|
So the idea is that Buildbot waits for builder capacity before it starts a
|
||
|
|
`nix-build` job, and idle builders should disappear some time after jobs finish.
|
||
|
|
|
||
|
|
The autoscaler launches EC2 nix builder instances, waits until they are reachable
|
||
|
|
through Tailscale and HAProxy, hands Buildbot a reservation for a ready slot,
|
||
|
|
and later drains and terminates unused capacity. It uses EC2 Spot by default and
|
||
|
|
can use on-demand instances for nested virtualization workloads when configured.
|
||
|
|
|
||
|
|
The Buildbot instance has a single master/single worker config like upstream buildbot-nix expects. HAPoroxy is used to present a logical single nix-builder host to buildbot-nix, this was inspired by [Garnix's yensid](https://web.archive.org/web/20260530230732/https://garnix.io/blog/yensid/)
|
||
|
|
|
||
|
|
## Pieces
|
||
|
|
|
||
|
|
The project has two main runtime pieces:
|
||
|
|
|
||
|
|
1. `agent/`: the autoscaler daemon and `autoscalerctl` CLI.
|
||
|
|
The daemon owns the slot database, reservation API, scheduler, EC2 runtime,
|
||
|
|
HAProxy binding, health checks, and metrics.
|
||
|
|
|
||
|
|
2. `buildbot-ext/`: the Buildbot integration.
|
||
|
|
The extension patches Buildbot `*/nix-build` builders with a capacity gate
|
||
|
|
step at the beginning and a reservation release step at the end. It also
|
||
|
|
lets the Buildbot host send Nix distributed builds through the HAProxy-backed
|
||
|
|
builder cluster.
|
||
|
|
|
||
|
|
The `nix/modules/` directory contains NixOS modules that package and wire these
|
||
|
|
pieces into hosts:
|
||
|
|
|
||
|
|
- `services.nix-builder-autoscaler` runs the daemon and can generate the HAProxy
|
||
|
|
slot configuration.
|
||
|
|
- `services.buildbot-nix.nix-build-autoscaler` installs the Buildbot extension
|
||
|
|
and configures Nix remote builder access.
|
||
|
|
|
||
|
|
## How it works
|
||
|
|
|
||
|
|
Buildbot (via the extension) creates a reservation before a Nix build starts.
|
||
|
|
The autoscaler assigns that reservation to a ready slot if one exists. If no
|
||
|
|
ready slot has capacity, the scheduler launches an EC2 instance into an empty
|
||
|
|
slot, subject to the configured minimum, maximum, warm pool, and timeout
|
||
|
|
settings.
|
||
|
|
|
||
|
|
The reconciler moves each slot through the runtime states:
|
||
|
|
|
||
|
|
1. `launching`: EC2 accepted the instance launch.
|
||
|
|
2. `booting`: the instance is running.
|
||
|
|
3. `binding`: the daemon found the instance's Tailscale IP and enabled its
|
||
|
|
HAProxy backend slot.
|
||
|
|
4. `ready`: HAProxy health checks pass and Buildbot can use the slot.
|
||
|
|
5. `draining` or `terminating`: the slot is being removed after release, idle
|
||
|
|
timeout, interruption, or failure.
|
||
|
|
|
||
|
|
Buildbot waits until the reservation becomes `ready`, then runs the build
|
||
|
|
through the configured Nix remote builder alias. When the build finishes, the
|
||
|
|
release step releases the reservation. Idle slots drain and terminate after the
|
||
|
|
configured cooldowns.
|
||
|
|
|
||
|
|
## Development
|
||
|
|
|
||
|
|
Common checks:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
nix flake check
|
||
|
|
nix build .#nix-builder-autoscaler
|
||
|
|
nix build .#buildbot-autoscale-ext
|
||
|
|
nix fmt
|
||
|
|
```
|
||
|
|
|
||
|
|
Useful local CLI commands against a running daemon:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
autoscalerctl status
|
||
|
|
autoscalerctl slots
|
||
|
|
autoscalerctl reservations
|
||
|
|
autoscalerctl drain <slot-id>
|
||
|
|
autoscalerctl reconcile-now
|
||
|
|
```
|
||
|
|
|
||
|
|
The daemon listens on `/run/nix-builder-autoscaler/daemon.sock` by default.
|
||
|
|
NixOS deployments should configure the service modules rather than hand-writing
|
||
|
|
daemon config files.
|