Deploy a new worker

Workers pull scan tasks from the queue, run the tool commands, and upload results. Bringing new capacity online is one script: copy deploy.sh, point it at the backend, pass the secret, choose a scale.

It handles the rest: it authenticates with the bootstrap secret, fetches the registry credentials, auto-picks the right image (pulls the prebuilt one if your architecture matches, otherwise builds locally), and starts N replicas.

Before you start

You need four things:

The repo. Clone it and work from the scripts dir — a local build (arm Macs, or when the registry is unreachable) needs the full tree, not just the script file:
bash
```
git clone git@github.com:Astrixion/BBM.git
cd BBM/scripts/vps-workers
```
Docker installed with the daemon running, plus curl. (Registry pulls also need the registry host added to your Docker daemon's insecure-registries once — the script prints the host and falls back to a local build if a pull fails.)
The backend URL — the http://<host>:3001 of the main BBM backend. Ask the team / ops channel if you don't know it.
The bootstrap secret — the shared WORKER_BOOTSTRAP_SECRET (below).

The bootstrap secret is held by the dev team

--secret is the shared WORKER_BOOTSTRAP_SECRET the backend enforces (≥32 chars). As an operator you don't create it — request it from the team vault / ops channel and paste it (don't type it at a prompt; it lands in shell history). (Admins only: generate one with openssl rand -hex 32 and set it once on the backend's .env.)

Run it

bash

# Mac/Linux — 20 standard workers
./deploy.sh --backend-url http://<BACKEND_IP>:3001 --secret <SECRET> --scale 20

# service-detection workers
./deploy.sh --backend-url http://<BACKEND_IP>:3001 --secret <SECRET> --type service-detection

# print the plan without running anything
./deploy.sh --backend-url http://<BACKEND_IP>:3001 --secret <SECRET> --dry-run

Run --dry-run first to see the resolved plan (image source, tags, replica count) before committing. Then drop the flag to launch. After it's up, confirm the workers registered by tailing logs (docker compose -p bbmw-standard logs -f) — you should see pulse/heartbeat lines and tasks being picked up — or check the workers list in the app.

powershell

# Windows
.\deploy.ps1 -BackendUrl http://<BACKEND_IP>:3001 -Secret <SECRET> -Scale 20

flag	default	meaning
`--backend-url`	(required)	main backend, e.g. `http://1.2.3.4:3001`
`--secret`	(required)	the `WORKER_BOOTSTRAP_SECRET`
`--type`	`standard`	`standard` (recon) or `service-detection` (nmap)
`--scale`	`1`	number of worker replicas
`--rebuild`	off	skip the registry, always build locally
`--dry-run`	off	print the plan, run nothing
`--tags`	(empty)	extra routing tags on top of the canonical `vps.standard` / `vps.sd`

Manage a running set:

bash

docker compose -p bbmw-standard logs -f     # tail (use bbmw-<type>)
docker compose -p bbmw-standard down        # stop + remove

The script

This is the script deploy.sh you run (from your scripts/vps-workers checkout above). It's the single source of truth — what you see here is exactly what runs. Reading it is optional; you invoke it with the flags above.

#!/usr/bin/env bash
#
# Deploy BBM vps workers locally on this machine (Mac/Linux).
#
# Resolves the worker image by architecture: if the private registry holds an
# image whose arch matches this host, it's pulled and run; otherwise the image
# is built locally (native arch) and run. Registry credentials are fetched from
# the backend's POST /api/bootstrap/worker-config endpoint, authenticated with
# WORKER_BOOTSTRAP_SECRET (generate with `openssl rand -hex 32`).
#
#   ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass>
#   ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --type service-detection
#   ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --scale 20
#   ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --rebuild
#   ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --dry-run
#
# Windows users: see deploy.ps1 (same flags).
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
WORKER_DIR="$REPO_ROOT/apps/worker"
# Dockerfiles COPY `worker/...`, so the build context is apps/ (worker's parent).
BUILD_CONTEXT="$REPO_ROOT/apps"
COMPOSE_FILE="$SCRIPT_DIR/docker-compose.deploy.yml"
# Per-type env file (set after TYPE is parsed). Each worker type gets its own so
# deploying one type never clobbers another's running/recreated config.
ENV_FILE=""

# ---- defaults -------------------------------------------------------------
BACKEND_URL=""
SECRET=""
TYPE="standard"
SCALE="1"
REBUILD="false"
DRY_RUN="false"
EXTRA_TAGS=""

usage() {
  sed -n '2,16p' "${BASH_SOURCE[0]}" | sed 's/^# \{0,1\}//'
  exit "${1:-0}"
}

die() { echo "error: $*" >&2; exit 1; }

# ---- arg parsing ----------------------------------------------------------
while [[ $# -gt 0 ]]; do
  case "$1" in
    --backend-url)       BACKEND_URL="${2:?}"; shift 2;;
    --secret)            SECRET="${2:?}"; shift 2;;
    --type)              TYPE="${2:?}"; shift 2;;
    --tags)              EXTRA_TAGS="${2:?}"; shift 2;;
    --scale)             SCALE="${2:?}"; shift 2;;
    --rebuild)           REBUILD="true"; shift;;
    --dry-run)           DRY_RUN="true"; shift;;
    -h|--help)           usage 0;;
    *) die "unknown argument: $1 (try --help)";;
  esac
done

# ---- validate -------------------------------------------------------------
[[ -n "$BACKEND_URL" ]] || { echo "error: --backend-url is required" >&2; usage 1; }
[[ -n "$SECRET" ]]      || { echo "error: --secret is required" >&2; usage 1; }
case "$TYPE" in
  standard|service-detection) ;;
  *) die "--type must be 'standard' or 'service-detection' (got '$TYPE')";;
esac
# Per-type env file so the standard and service-detection deploys don't share
# (and clobber) one .env.deploy.
ENV_FILE="$SCRIPT_DIR/.env.deploy.${TYPE}"
command -v docker >/dev/null 2>&1 || die "docker is not installed or not on PATH"
command -v curl   >/dev/null 2>&1 || die "curl is not installed or not on PATH"

LOCAL_IMAGE="${TYPE}:latest"
DOCKERFILE="$WORKER_DIR/dockerfiles/${TYPE}.dockerfile"
[[ -f "$DOCKERFILE" ]] || die "no dockerfile for type '$TYPE' at $DOCKERFILE"

HOST_ARCH="$(docker version -f '{{.Server.Arch}}' 2>/dev/null || true)"
[[ -n "$HOST_ARCH" ]] || die "could not reach the docker daemon (is it running?)"

# ---- bootstrap fetch ------------------------------------------------------
# Pull registry creds (and the rest of the worker config bag) from the backend.
# The container will fetch the same endpoint again at start to populate its
# runtime env; here we only consume the registry.* fields for `docker pull`.
fetch_bootstrap() {
  local url="${BACKEND_URL%/}/api/bootstrap/worker-config"
  local body
  body=$(curl -fsS -X POST -H "Authorization: Bearer ${SECRET}" "$url") \
    || die "bootstrap fetch failed (network or non-2xx) — check --backend-url and --secret"
  printf '%s' "$body"
}

# Flat-key JSON string extractor. The bootstrap response is camelCase System.Text.Json
# output (one level deep), so a single key never appears nested elsewhere and
# the values (registry endpoint/user/htpasswd-friendly password) don't contain
# backslashes or embedded quotes.
json_str() {
  printf '%s' "$1" \
    | grep -oE "\"$2\"[[:space:]]*:[[:space:]]*\"[^\"]*\"" \
    | head -1 \
    | sed -E 's/.*"([^"]*)"$/\1/'
}

BOOTSTRAP_JSON="$(fetch_bootstrap)"
REGISTRY_ENDPOINT="$(json_str "$BOOTSTRAP_JSON" registryEndpoint)"
REGISTRY_USER="$(json_str "$BOOTSTRAP_JSON" registryUsername)"
REGISTRY_PASS="$(json_str "$BOOTSTRAP_JSON" registryPassword)"

[[ -n "$REGISTRY_ENDPOINT" ]] || die "bootstrap response missing registryEndpoint"
[[ -n "$REGISTRY_USER" ]]     || die "bootstrap response missing registryUsername"
[[ -n "$REGISTRY_PASS" ]]     || die "bootstrap response missing registryPassword"

REGISTRY_IMAGE="${REGISTRY_ENDPOINT}/${TYPE}:latest"

# ---- image resolution -----------------------------------------------------
# Sets IMAGE_REF and IMAGE_SOURCE (registry|build).
IMAGE_REF=""
IMAGE_SOURCE=""

registry_arch_matches() {
  # Probe the registry HTTP API directly rather than `docker manifest inspect`,
  # which needs a working `docker login` (broken under the OrbStack proxy) and
  # doesn't expose arch for single-arch schema2 manifests anyway. Any failure →
  # return non-zero so we fall back to a local build.
  command -v curl >/dev/null 2>&1 || return 1
  local base="http://${REGISTRY_ENDPOINT}/v2/${TYPE}"
  local auth="${REGISTRY_USER}:${REGISTRY_PASS}"
  local man
  man="$(curl -fsSL -u "$auth" \
      -H 'Accept: application/vnd.oci.image.index.v1+json' \
      -H 'Accept: application/vnd.docker.distribution.manifest.list.v2+json' \
      -H 'Accept: application/vnd.docker.distribution.manifest.v2+json' \
      -H 'Accept: application/vnd.oci.image.manifest.v1+json' \
      "${base}/manifests/latest" 2>/dev/null)" || return 1
  [[ -n "$man" ]] || return 1

  local archs
  if echo "$man" | grep -q '"manifests"'; then
    # Multi-arch index: architectures are in each platform entry.
    archs="$(echo "$man" | grep -o '"architecture"[[:space:]]*:[[:space:]]*"[^"]*"' | sed -E 's/.*"([^"]*)"$/\1/')"
  else
    # Single schema2 manifest: arch lives in the config blob (first digest).
    local cfg
    cfg="$(echo "$man" | grep -o '"digest"[[:space:]]*:[[:space:]]*"sha256:[a-f0-9]*"' | head -1 | grep -o 'sha256:[a-f0-9]*')"
    [[ -n "$cfg" ]] || return 1
    archs="$(curl -fsSL -u "$auth" "${base}/blobs/${cfg}" 2>/dev/null \
      | grep -o '"architecture"[[:space:]]*:[[:space:]]*"[^"]*"' | sed -E 's/.*"([^"]*)"$/\1/')"
  fi
  echo "$archs" | grep -qx "$HOST_ARCH"
}

resolve_image() {
  if [[ "$REBUILD" == "true" ]]; then
    echo "==> --rebuild set: building $LOCAL_IMAGE locally for $HOST_ARCH"
    IMAGE_REF="$LOCAL_IMAGE"; IMAGE_SOURCE="build"; return
  fi
  echo "==> Probing registry $REGISTRY_IMAGE for a $HOST_ARCH image..."
  if registry_arch_matches; then
    echo "==> Registry image matches host arch ($HOST_ARCH) — will pull."
    IMAGE_REF="$REGISTRY_IMAGE"; IMAGE_SOURCE="registry"
  else
    echo "==> No matching registry image (arch mismatch or registry unreachable) — will build locally."
    IMAGE_REF="$LOCAL_IMAGE"; IMAGE_SOURCE="build"
  fi
}

ensure_submodules() {
  # The service-detection dockerfile copies apps/service-detection/{data,data-httparchive}
  # — both git submodules. A fresh clone leaves them empty and the docker build
  # fails inside `build-apps-json.py` with "no apps loaded from any source".
  # Init/update only when we're in a git checkout (deploy.sh might be run from
  # a tarball drop where there's nothing to update).
  if [[ -d "$REPO_ROOT/.git" ]] && command -v git >/dev/null 2>&1; then
    echo "==> Ensuring git submodules are initialized..."
    (cd "$REPO_ROOT" && git submodule update --init --recursive --quiet) \
      || echo "!!  submodule update failed; the build may fail if data/ submodules are missing" >&2
  fi
}

build_image() {
  ensure_submodules
  echo "==> Compiling worker TypeScript..."
  bash "$WORKER_DIR/build.sh"
  echo "==> Building docker image $LOCAL_IMAGE ($HOST_ARCH)..."
  docker build -f "$DOCKERFILE" -t "$LOCAL_IMAGE" "$BUILD_CONTEXT"
}

pull_image() {
  echo "==> Logging into registry $REGISTRY_ENDPOINT..."
  echo "$REGISTRY_PASS" | docker login "$REGISTRY_ENDPOINT" -u "$REGISTRY_USER" --password-stdin >/dev/null 2>&1 || true
  echo "==> Pulling $REGISTRY_IMAGE..."
  if ! docker pull "$REGISTRY_IMAGE"; then
    echo "!!  Pull failed despite a compatible manifest (insecure-registries not set?). Falling back to a local build." >&2
    IMAGE_REF="$LOCAL_IMAGE"; IMAGE_SOURCE="build"
    build_image
  fi
}

# ---- env file -------------------------------------------------------------
# Trimmed: only routing + bootstrap inputs. The container's entrypoint re-fetches
# Rabbit/MinIO/registry creds from the backend at start time using BACKEND_SECRET.
write_env_file() {
  local canon_tag="vps.$([ "$TYPE" = "service-detection" ] && echo sd || echo standard)"
  local worker_tags="${canon_tag}${EXTRA_TAGS:+,$EXTRA_TAGS}"
  cat > "$ENV_FILE" <<EOF
# Generated by deploy.sh — regenerated every run, do not edit.
WORKER_TYPE=vps
WORKER_IMAGE=${TYPE}
WORKER_TAGS=${worker_tags}

BACKEND_URL=${BACKEND_URL}
BACKEND_SECRET=${SECRET}

NODE_ENV=production
WORKER_TIMEOUT_MS=7200000
EOF
  # service-detection tuning (overridable via env when invoking deploy.sh).
  # N=3 is the memory-bound concurrency sweet spot; recycle the browser every 40
  # pages; 20s per-page timeout. See docs/superpowers/specs/2026-06-18-service-detection-optimization-design.md
  if [[ "$TYPE" == "service-detection" ]]; then
    cat >> "$ENV_FILE" <<EOF
SD_MAX_TABS=${SD_MAX_TABS:-3}
SD_BROWSER_RECYCLE_EVERY=${SD_BROWSER_RECYCLE_EVERY:-40}
SD_PAGE_TIMEOUT_SECS=${SD_PAGE_TIMEOUT_SECS:-20}
EOF
  fi
  chmod 600 "$ENV_FILE"
}

# ---- plan / execute -------------------------------------------------------
resolve_image
PROJECT="bbmw-${TYPE}"

print_plan() {
  cat <<EOF

Plan
────
  host arch        : ${HOST_ARCH}
  worker type      : ${TYPE}
  scale            : ${SCALE}
  image source     : ${IMAGE_SOURCE}
  image ref        : ${IMAGE_REF}
  backend url      : ${BACKEND_URL}
  bootstrap fetched: ok
  registry         : ${REGISTRY_ENDPOINT}
  env file         : ${ENV_FILE}
  compose          : docker compose -f ${COMPOSE_FILE} -p ${PROJECT} up -d --scale worker=${SCALE}
EOF
}

if [[ "$DRY_RUN" == "true" ]]; then
  print_plan
  echo
  echo "(dry run — nothing executed)"
  exit 0
fi

if [[ "$IMAGE_SOURCE" == "build" ]]; then
  build_image
else
  pull_image
fi

write_env_file

echo "==> Starting ${SCALE}x ${TYPE} worker(s) (project ${PROJECT})..."
WORKER_IMAGE_REF="$IMAGE_REF" WORKER_ENV_FILE="$(basename "$ENV_FILE")" \
  docker compose -f "$COMPOSE_FILE" -p "$PROJECT" up -d --scale "worker=${SCALE}"

echo
echo "✓ ${SCALE}x ${TYPE} worker(s) deployed."
echo "  logs:  docker compose -p ${PROJECT} logs -f"
echo "  stop:  docker compose -p ${PROJECT} down"

Compose file it drives

yaml

# Parameterized local deploy of vps workers (any type), driven by deploy.sh /
# deploy.ps1. Both scripts resolve the image (pulled-from-registry OR
# locally-built) into WORKER_IMAGE_REF and regenerate .env.deploy from CLI args
# before invoking:
#
#   docker compose -f docker-compose.deploy.yml -p bbmw-<type> up -d --scale worker=<N>
#
# Workers reach RabbitMQ/MinIO/backend over the VPS public IP, so no external
# bbm-network is needed — the default bridge network is fine.
services:
  worker:
    image: ${WORKER_IMAGE_REF:?run via deploy.sh / deploy.ps1, which sets WORKER_IMAGE_REF}
    # tini as PID 1 reaps reparented Chromium/chrome_crashpad helpers. node (the
    # real PID 1 here) has no SIGCHLD reaper, so without this the service-detection
    # browser children pile up as zombies and eventually exhaust the PID table.
    init: true
    restart: unless-stopped
    # Real /dev/shm for Chromium (service-detection). Default 64MB starves the
    # renderer and contributes to browser wedges; harmless for standard workers.
    shm_size: ${WORKER_SHM_SIZE:-1g}
    # Per-type env file (deploy.sh sets WORKER_ENV_FILE=.env.deploy.<type>) so
    # the standard and service-detection deploys never share/clobber one file.
    env_file:
      - ${WORKER_ENV_FILE:-.env.deploy}
    environment:
      - WORKER_TYPE=vps
    entrypoint: ["/bin/sh", "-c", "WORKER_NAME=$(hostname) exec docker-entrypoint.sh node dist/entrypoint.js"]

How the worker gets its config

The script only writes BACKEND_URL + BACKEND_SECRET into an ephemeral, chmod 600.env.deploy. The worker container itself calls the bootstrap endpoint at start to fetch its RabbitMQ / MinIO credentials — secrets are never baked into an image or committed.

Deploy a new worker ​

Before you start ​

Run it ​

The script ​

Compose file it drives ​

Deploy a new worker

Before you start

Run it

The script

Compose file it drives