In this post I’m going to show how to setup, with Terraform, a Buildkite-based CI using your own workers that run on GCP. For reference, the complete Terraform configuration for this post is available in this repository.

  • The setup gives you complete control on how fast your worker are.
  • The workers come with Nix pre-installed, so you won’t need to spend time downloading the same docker container again and again on every push as would usually happen with most cloud CI providers.
  • The workers come with a distributed Nix cache set up. So authors of CI scripts won’t have to bother about caching at all.

Secrets

We are going to need to import two secret resources:

resource "secret_resource" "buildkite_agent_token" {}
resource "secret_resource" "nix_signing_key" {}

To initialize the resources, execute the following from the root directory of your project:

$ terraform import secret_resource.<name> <value>

where:

  • buildkite_agent_token is obtained from the Buildkite site.
  • nix_signing_key can be generated by running:

    nix-store --generate-binary-cache-key <your-key-name> key.private key.public

    The key.private file will contain the value for the signing key. I’ll explain later in the post how to use the contents of the key.public file.

Custom NixOS image

The next step is to use the nixos_image_custom module to create a NixOS image with custom configuration.

resource "google_storage_bucket" "nixos_image" {
  name     = "buildkite-nixos-image-bucket-name"
  location = "EU"
}

module "nixos_image_custom" {
  source      = "git::https://github.com/tweag/terraform-nixos.git//google_image_nixos_custom?ref=40fedb1fae7df5bd7ad9defdd71eb06b7252810f"
  bucket_name = "${google_storage_bucket.nixos_image.name}"
  nixos_config = "${path.module}/nixos-config.nix"
}

The snippet above first creates a bucket nixos_image where the generated image will be uploaded, then it uses the nixos_image_custom module, which handles generation of the image using the configuration from the nixos-config.nix file. The file is assumed to be in the same directory as the Terraform configuration, hence ${path.module}/.

Service account and cache bucket

To control access to different resources we will also need a service account:

resource "google_service_account" "buildkite_agent" {
  account_id   = "buildkite-agent"
  display_name = "Buildkite agent"
}

We can use it to set access permissions for the storage bucket that will contain the Nix cache:

resource "google_storage_bucket" "nix_cache_bucket" {
  name     = "nix-cache-bucket-name"
  location = "EU"
  force_destroy = true
  retention_policy {
    retention_period = 7889238 # three months
  }
}

resource "google_storage_bucket_iam_member" "buildkite_nix_cache_writer" {
  bucket = "${google_storage_bucket.nix_cache_bucket.name}"
  role = "roles/storage.objectAdmin"
  member = "serviceAccount:${google_service_account.buildkite_agent.email}"
}

resource "google_storage_bucket_iam_member" "buildkite_nix_cache_reader" {
  bucket = "${google_storage_bucket.nix_cache_bucket.name}"
  role   = "roles/storage.objectViewer"
  member = "allUsers"
}

The bucket is configured to automatically delete objects that are older than 3 months. We give the service account the ability to write to and read from the bucket (the roles/storage.objectAdmin role). The rest of the world gets the ability to read from the bucket (the roles/storage.objectViewer role).

NixOS configuration

Here is the content of my nixos-config.nix. This NixOS configuration can serve as a starting point for writing your own. The numbered points refer to the notes below.

{ modulesPath, pkgs, ... }:
{
  imports = [
    "${modulesPath}/virtualisation/google-compute-image.nix"
  ];
  virtualisation.googleComputeImage.diskSize = 3000;
  virtualisation.docker.enable = true;

  services = {
    buildkite-agents.agent = {
      enable = true;
      extraConfig = ''
      tags-from-gcp=true
      '';
      tags = {
        os = "nixos";
        nix = "true";
      };
      tokenPath = "/run/keys/buildkite-agent-token"; # (1)
      runtimePackages = with pkgs; [
        bash
        curl
        gcc
        gnutar
        gzip
        ncurses
        nix
        python3
        xz
        # (2) extend as necessary
      ];
    };
    nix-store-gcs-proxy = {
      nix-cache-bucket-name = { # (3)
        address = "localhost:3000";
      };
    };
  };

  nix = {
    binaryCaches = [
      "https://cache.nixos.org/"
      "https://storage.googleapis.com/nix-cache-bucket-name" # (4)
    ];
    binaryCachePublicKeys = [
      "cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="
      "<insert your public signing key here>" # (5)
    ];
    extraOptions = ''
      post-build-hook = /etc/nix/upload-to-cache.sh # (6)
    '';
  };

  security.sudo.enable = true;
  services.openssh.passwordAuthentication = false;
  security.sudo.wheelNeedsPassword = false;
}

Notes:

  1. This file will be created later by the startup script (see below).
  2. The collection of packages that are available to the Buildkite script can be edited here.
  3. Replace nix-cache-bucket-name by the name of the bucket used for the Nix cache.
  4. Similarly to (3) replace nix-cache-bucket-name in the URL.
  5. Insert the contents of the key.public file you generated earlier.
  6. The file will be created later by the startup script.

Compute instances and startup script

The following snippet sets up an instance group manager which controls multiple (3 in this example) Buildkite agents. The numbered points refer to the notes below.

data "template_file" "buildkite_nixos_startup" { # (1)
  template = "${file("${path.module}/files/buildkite_nixos_startup.sh")}"

  vars = {
    buildkite_agent_token = "${secret_resource.buildkite_agent_token.value}"
    nix_signing_key = "${secret_resource.nix_signing_key.value}"
  }
}

resource "google_compute_instance_template" "buildkite_nixos" {
  name_prefix  = "buildkite-nixos-"
  machine_type = "n1-standard-8"

  disk {
    boot         = true
    disk_size_gb = 100
    source_image = "${module.nixos_image_custom.self_link}"
  }

  metadata_startup_script = "${data.template_file.buildkite_nixos_startup.rendered}"

  network_interface {
    network = "default"

    access_config = {}
  }

  metadata {
    enable-oslogin = true
  }

  service_account {
    email = "${google_service_account.buildkite_agent.email}"

    scopes = [
      "compute-ro",
      "logging-write",
      "storage-rw",
    ]
  }

  scheduling {
    automatic_restart   = false
    on_host_maintenance = "TERMINATE"
    preemptible         = true # (2)
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_instance_group_manager" "buildkite_nixos" {
  provider           = "google-beta"
  name               = "buildkite-nixos"
  base_instance_name = "buildkite-nixos"
  target_size        = "3" # (3)
  zone               = "<your-zone>" # (4)

  version {
    name              = "buildkite_nixos"
    instance_template = "${google_compute_instance_template.buildkite_nixos.self_link}"
  }

  update_policy {
    type                  = "PROACTIVE"
    minimal_action        = "REPLACE"
    max_unavailable_fixed = 1
  }
}

Notes:

  1. The file files/buildkite_nixos_startup.sh is shown below.
  2. Because of the remote Nix cache, the nodes can be preemptible (short-lived, never lasting longer than 24 hours), which results in much lower GCP costs.
  3. Changing target_size allows you to scale the system. This is the number of instances that are controlled by the instance group manager.
  4. Insert your desired zone here.

Finally, here is the startup script:

# workaround https://github.com/NixOS/nixpkgs/issues/42344
chown root:keys /run/keys
chmod 750 /run/keys
umask 037
echo "${buildkite_agent_token}" > /run/keys/buildkite-agent-token
chown root:keys /run/keys/buildkite-agent-token
umask 077
echo '${nix_signing_key}' > /run/keys/nix_signing_key
chown root:keys /run/keys/nix-signing-key

cat <<EOF > /etc/nix/upload-to-cache.sh
#!/bin/sh

set -eu
set -f # disable globbing
export IFS=' '

echo "Uploading paths" $OUT_PATHS
exec nix copy --to http://localhost:3000?secret-key=/run/keys/nix_signing_key \$OUT_PATHS
EOF
chmod +x /etc/nix/upload-to-cache.sh

This script uses the Nix post build hook approach for uploading to the cache without polluting the CI script.

Conclusion

The setup allows us to run Nix builds in an environment where Nix tooling is available. It also provides a remote Nix cache which does not require that the authors of CI scripts set it up or, even, be aware of it at all. We use this setup on many of Tweag’s projects and found that both mental and performance overheads are minimal. A typical CI script looks like this:

steps:
  - label: Build and test
    command: nix-build -A distributed-closure --no-out-link

Builds with up-to-date cache that does not cause re-builds may finish in literally 1 second.