Appearance
Deploy a new worker
Workers pull scan tasks from the queue, run the tool commands, and upload results. Bringing new capacity online is one script: copy deploy.sh, point it at the backend, pass the secret, choose a scale.
It handles the rest: it authenticates with the bootstrap secret, fetches the registry credentials, auto-picks the right image (pulls the prebuilt one if your architecture matches, otherwise builds locally), and starts N replicas.
Before you start
You need four things:
- The repo. Clone it and work from the scripts dir — a local build (arm Macs, or when the registry is unreachable) needs the full tree, not just the script file:bash
git clone git@github.com:Astrixion/BBM.git cd BBM/scripts/vps-workers - Docker installed with the daemon running, plus
curl. (Registry pulls also need the registry host added to your Docker daemon'sinsecure-registriesonce — the script prints the host and falls back to a local build if a pull fails.) - The backend URL — the
http://<host>:3001of the main BBM backend. Ask the team / ops channel if you don't know it. - The bootstrap secret — the shared
WORKER_BOOTSTRAP_SECRET(below).
The bootstrap secret is held by the dev team
--secret is the shared WORKER_BOOTSTRAP_SECRET the backend enforces (≥32 chars). As an operator you don't create it — request it from the team vault / ops channel and paste it (don't type it at a prompt; it lands in shell history). (Admins only: generate one with openssl rand -hex 32 and set it once on the backend's .env.)
Run it
bash
# Mac/Linux — 20 standard workers
./deploy.sh --backend-url http://<BACKEND_IP>:3001 --secret <SECRET> --scale 20
# service-detection workers
./deploy.sh --backend-url http://<BACKEND_IP>:3001 --secret <SECRET> --type service-detection
# print the plan without running anything
./deploy.sh --backend-url http://<BACKEND_IP>:3001 --secret <SECRET> --dry-runRun --dry-run first to see the resolved plan (image source, tags, replica count) before committing. Then drop the flag to launch. After it's up, confirm the workers registered by tailing logs (docker compose -p bbmw-standard logs -f) — you should see pulse/heartbeat lines and tasks being picked up — or check the workers list in the app.
powershell
# Windows
.\deploy.ps1 -BackendUrl http://<BACKEND_IP>:3001 -Secret <SECRET> -Scale 20| flag | default | meaning |
|---|---|---|
--backend-url | (required) | main backend, e.g. http://1.2.3.4:3001 |
--secret | (required) | the WORKER_BOOTSTRAP_SECRET |
--type | standard | standard (recon) or service-detection (nmap) |
--scale | 1 | number of worker replicas |
--rebuild | off | skip the registry, always build locally |
--dry-run | off | print the plan, run nothing |
--tags | (empty) | extra routing tags on top of the canonical vps.standard / vps.sd |
Manage a running set:
bash
docker compose -p bbmw-standard logs -f # tail (use bbmw-<type>)
docker compose -p bbmw-standard down # stop + removeThe script
This is the script deploy.sh you run (from your scripts/vps-workers checkout above). It's the single source of truth — what you see here is exactly what runs. Reading it is optional; you invoke it with the flags above.
sh
#!/usr/bin/env bash
#
# Deploy BBM vps workers locally on this machine (Mac/Linux).
#
# Resolves the worker image by architecture: if the private registry holds an
# image whose arch matches this host, it's pulled and run; otherwise the image
# is built locally (native arch) and run. Registry credentials are fetched from
# the backend's POST /api/bootstrap/worker-config endpoint, authenticated with
# WORKER_BOOTSTRAP_SECRET (generate with `openssl rand -hex 32`).
#
# ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass>
# ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --type service-detection
# ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --scale 20
# ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --rebuild
# ./deploy.sh --backend-url http://1.2.3.4:3001 --secret <pass> --dry-run
#
# Windows users: see deploy.ps1 (same flags).
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
WORKER_DIR="$REPO_ROOT/apps/worker"
# Dockerfiles COPY `worker/...`, so the build context is apps/ (worker's parent).
BUILD_CONTEXT="$REPO_ROOT/apps"
COMPOSE_FILE="$SCRIPT_DIR/docker-compose.deploy.yml"
# Per-type env file (set after TYPE is parsed). Each worker type gets its own so
# deploying one type never clobbers another's running/recreated config.
ENV_FILE=""
# ---- defaults -------------------------------------------------------------
BACKEND_URL=""
SECRET=""
TYPE="standard"
SCALE="1"
REBUILD="false"
DRY_RUN="false"
EXTRA_TAGS=""
usage() {
sed -n '2,16p' "${BASH_SOURCE[0]}" | sed 's/^# \{0,1\}//'
exit "${1:-0}"
}
die() { echo "error: $*" >&2; exit 1; }
# ---- arg parsing ----------------------------------------------------------
while [[ $# -gt 0 ]]; do
case "$1" in
--backend-url) BACKEND_URL="${2:?}"; shift 2;;
--secret) SECRET="${2:?}"; shift 2;;
--type) TYPE="${2:?}"; shift 2;;
--tags) EXTRA_TAGS="${2:?}"; shift 2;;
--scale) SCALE="${2:?}"; shift 2;;
--rebuild) REBUILD="true"; shift;;
--dry-run) DRY_RUN="true"; shift;;
-h|--help) usage 0;;
*) die "unknown argument: $1 (try --help)";;
esac
done
# ---- validate -------------------------------------------------------------
[[ -n "$BACKEND_URL" ]] || { echo "error: --backend-url is required" >&2; usage 1; }
[[ -n "$SECRET" ]] || { echo "error: --secret is required" >&2; usage 1; }
case "$TYPE" in
standard|service-detection) ;;
*) die "--type must be 'standard' or 'service-detection' (got '$TYPE')";;
esac
# Per-type env file so the standard and service-detection deploys don't share
# (and clobber) one .env.deploy.
ENV_FILE="$SCRIPT_DIR/.env.deploy.${TYPE}"
command -v docker >/dev/null 2>&1 || die "docker is not installed or not on PATH"
command -v curl >/dev/null 2>&1 || die "curl is not installed or not on PATH"
LOCAL_IMAGE="${TYPE}:latest"
DOCKERFILE="$WORKER_DIR/dockerfiles/${TYPE}.dockerfile"
[[ -f "$DOCKERFILE" ]] || die "no dockerfile for type '$TYPE' at $DOCKERFILE"
HOST_ARCH="$(docker version -f '{{.Server.Arch}}' 2>/dev/null || true)"
[[ -n "$HOST_ARCH" ]] || die "could not reach the docker daemon (is it running?)"
# ---- bootstrap fetch ------------------------------------------------------
# Pull registry creds (and the rest of the worker config bag) from the backend.
# The container will fetch the same endpoint again at start to populate its
# runtime env; here we only consume the registry.* fields for `docker pull`.
fetch_bootstrap() {
local url="${BACKEND_URL%/}/api/bootstrap/worker-config"
local body
body=$(curl -fsS -X POST -H "Authorization: Bearer ${SECRET}" "$url") \
|| die "bootstrap fetch failed (network or non-2xx) — check --backend-url and --secret"
printf '%s' "$body"
}
# Flat-key JSON string extractor. The bootstrap response is camelCase System.Text.Json
# output (one level deep), so a single key never appears nested elsewhere and
# the values (registry endpoint/user/htpasswd-friendly password) don't contain
# backslashes or embedded quotes.
json_str() {
printf '%s' "$1" \
| grep -oE "\"$2\"[[:space:]]*:[[:space:]]*\"[^\"]*\"" \
| head -1 \
| sed -E 's/.*"([^"]*)"$/\1/'
}
BOOTSTRAP_JSON="$(fetch_bootstrap)"
REGISTRY_ENDPOINT="$(json_str "$BOOTSTRAP_JSON" registryEndpoint)"
REGISTRY_USER="$(json_str "$BOOTSTRAP_JSON" registryUsername)"
REGISTRY_PASS="$(json_str "$BOOTSTRAP_JSON" registryPassword)"
[[ -n "$REGISTRY_ENDPOINT" ]] || die "bootstrap response missing registryEndpoint"
[[ -n "$REGISTRY_USER" ]] || die "bootstrap response missing registryUsername"
[[ -n "$REGISTRY_PASS" ]] || die "bootstrap response missing registryPassword"
REGISTRY_IMAGE="${REGISTRY_ENDPOINT}/${TYPE}:latest"
# ---- image resolution -----------------------------------------------------
# Sets IMAGE_REF and IMAGE_SOURCE (registry|build).
IMAGE_REF=""
IMAGE_SOURCE=""
registry_arch_matches() {
# Probe the registry HTTP API directly rather than `docker manifest inspect`,
# which needs a working `docker login` (broken under the OrbStack proxy) and
# doesn't expose arch for single-arch schema2 manifests anyway. Any failure →
# return non-zero so we fall back to a local build.
command -v curl >/dev/null 2>&1 || return 1
local base="http://${REGISTRY_ENDPOINT}/v2/${TYPE}"
local auth="${REGISTRY_USER}:${REGISTRY_PASS}"
local man
man="$(curl -fsSL -u "$auth" \
-H 'Accept: application/vnd.oci.image.index.v1+json' \
-H 'Accept: application/vnd.docker.distribution.manifest.list.v2+json' \
-H 'Accept: application/vnd.docker.distribution.manifest.v2+json' \
-H 'Accept: application/vnd.oci.image.manifest.v1+json' \
"${base}/manifests/latest" 2>/dev/null)" || return 1
[[ -n "$man" ]] || return 1
local archs
if echo "$man" | grep -q '"manifests"'; then
# Multi-arch index: architectures are in each platform entry.
archs="$(echo "$man" | grep -o '"architecture"[[:space:]]*:[[:space:]]*"[^"]*"' | sed -E 's/.*"([^"]*)"$/\1/')"
else
# Single schema2 manifest: arch lives in the config blob (first digest).
local cfg
cfg="$(echo "$man" | grep -o '"digest"[[:space:]]*:[[:space:]]*"sha256:[a-f0-9]*"' | head -1 | grep -o 'sha256:[a-f0-9]*')"
[[ -n "$cfg" ]] || return 1
archs="$(curl -fsSL -u "$auth" "${base}/blobs/${cfg}" 2>/dev/null \
| grep -o '"architecture"[[:space:]]*:[[:space:]]*"[^"]*"' | sed -E 's/.*"([^"]*)"$/\1/')"
fi
echo "$archs" | grep -qx "$HOST_ARCH"
}
resolve_image() {
if [[ "$REBUILD" == "true" ]]; then
echo "==> --rebuild set: building $LOCAL_IMAGE locally for $HOST_ARCH"
IMAGE_REF="$LOCAL_IMAGE"; IMAGE_SOURCE="build"; return
fi
echo "==> Probing registry $REGISTRY_IMAGE for a $HOST_ARCH image..."
if registry_arch_matches; then
echo "==> Registry image matches host arch ($HOST_ARCH) — will pull."
IMAGE_REF="$REGISTRY_IMAGE"; IMAGE_SOURCE="registry"
else
echo "==> No matching registry image (arch mismatch or registry unreachable) — will build locally."
IMAGE_REF="$LOCAL_IMAGE"; IMAGE_SOURCE="build"
fi
}
ensure_submodules() {
# The service-detection dockerfile copies apps/service-detection/{data,data-httparchive}
# — both git submodules. A fresh clone leaves them empty and the docker build
# fails inside `build-apps-json.py` with "no apps loaded from any source".
# Init/update only when we're in a git checkout (deploy.sh might be run from
# a tarball drop where there's nothing to update).
if [[ -d "$REPO_ROOT/.git" ]] && command -v git >/dev/null 2>&1; then
echo "==> Ensuring git submodules are initialized..."
(cd "$REPO_ROOT" && git submodule update --init --recursive --quiet) \
|| echo "!! submodule update failed; the build may fail if data/ submodules are missing" >&2
fi
}
build_image() {
ensure_submodules
echo "==> Compiling worker TypeScript..."
bash "$WORKER_DIR/build.sh"
echo "==> Building docker image $LOCAL_IMAGE ($HOST_ARCH)..."
docker build -f "$DOCKERFILE" -t "$LOCAL_IMAGE" "$BUILD_CONTEXT"
}
pull_image() {
echo "==> Logging into registry $REGISTRY_ENDPOINT..."
echo "$REGISTRY_PASS" | docker login "$REGISTRY_ENDPOINT" -u "$REGISTRY_USER" --password-stdin >/dev/null 2>&1 || true
echo "==> Pulling $REGISTRY_IMAGE..."
if ! docker pull "$REGISTRY_IMAGE"; then
echo "!! Pull failed despite a compatible manifest (insecure-registries not set?). Falling back to a local build." >&2
IMAGE_REF="$LOCAL_IMAGE"; IMAGE_SOURCE="build"
build_image
fi
}
# ---- env file -------------------------------------------------------------
# Trimmed: only routing + bootstrap inputs. The container's entrypoint re-fetches
# Rabbit/MinIO/registry creds from the backend at start time using BACKEND_SECRET.
write_env_file() {
local canon_tag="vps.$([ "$TYPE" = "service-detection" ] && echo sd || echo standard)"
local worker_tags="${canon_tag}${EXTRA_TAGS:+,$EXTRA_TAGS}"
cat > "$ENV_FILE" <<EOF
# Generated by deploy.sh — regenerated every run, do not edit.
WORKER_TYPE=vps
WORKER_IMAGE=${TYPE}
WORKER_TAGS=${worker_tags}
BACKEND_URL=${BACKEND_URL}
BACKEND_SECRET=${SECRET}
NODE_ENV=production
WORKER_TIMEOUT_MS=7200000
EOF
# service-detection tuning (overridable via env when invoking deploy.sh).
# N=3 is the memory-bound concurrency sweet spot; recycle the browser every 40
# pages; 20s per-page timeout. See docs/superpowers/specs/2026-06-18-service-detection-optimization-design.md
if [[ "$TYPE" == "service-detection" ]]; then
cat >> "$ENV_FILE" <<EOF
SD_MAX_TABS=${SD_MAX_TABS:-3}
SD_BROWSER_RECYCLE_EVERY=${SD_BROWSER_RECYCLE_EVERY:-40}
SD_PAGE_TIMEOUT_SECS=${SD_PAGE_TIMEOUT_SECS:-20}
EOF
fi
chmod 600 "$ENV_FILE"
}
# ---- plan / execute -------------------------------------------------------
resolve_image
PROJECT="bbmw-${TYPE}"
print_plan() {
cat <<EOF
Plan
────
host arch : ${HOST_ARCH}
worker type : ${TYPE}
scale : ${SCALE}
image source : ${IMAGE_SOURCE}
image ref : ${IMAGE_REF}
backend url : ${BACKEND_URL}
bootstrap fetched: ok
registry : ${REGISTRY_ENDPOINT}
env file : ${ENV_FILE}
compose : docker compose -f ${COMPOSE_FILE} -p ${PROJECT} up -d --scale worker=${SCALE}
EOF
}
if [[ "$DRY_RUN" == "true" ]]; then
print_plan
echo
echo "(dry run — nothing executed)"
exit 0
fi
if [[ "$IMAGE_SOURCE" == "build" ]]; then
build_image
else
pull_image
fi
write_env_file
echo "==> Starting ${SCALE}x ${TYPE} worker(s) (project ${PROJECT})..."
WORKER_IMAGE_REF="$IMAGE_REF" WORKER_ENV_FILE="$(basename "$ENV_FILE")" \
docker compose -f "$COMPOSE_FILE" -p "$PROJECT" up -d --scale "worker=${SCALE}"
echo
echo "✓ ${SCALE}x ${TYPE} worker(s) deployed."
echo " logs: docker compose -p ${PROJECT} logs -f"
echo " stop: docker compose -p ${PROJECT} down"Compose file it drives
yaml
# Parameterized local deploy of vps workers (any type), driven by deploy.sh /
# deploy.ps1. Both scripts resolve the image (pulled-from-registry OR
# locally-built) into WORKER_IMAGE_REF and regenerate .env.deploy from CLI args
# before invoking:
#
# docker compose -f docker-compose.deploy.yml -p bbmw-<type> up -d --scale worker=<N>
#
# Workers reach RabbitMQ/MinIO/backend over the VPS public IP, so no external
# bbm-network is needed — the default bridge network is fine.
services:
worker:
image: ${WORKER_IMAGE_REF:?run via deploy.sh / deploy.ps1, which sets WORKER_IMAGE_REF}
# tini as PID 1 reaps reparented Chromium/chrome_crashpad helpers. node (the
# real PID 1 here) has no SIGCHLD reaper, so without this the service-detection
# browser children pile up as zombies and eventually exhaust the PID table.
init: true
restart: unless-stopped
# Real /dev/shm for Chromium (service-detection). Default 64MB starves the
# renderer and contributes to browser wedges; harmless for standard workers.
shm_size: ${WORKER_SHM_SIZE:-1g}
# Per-type env file (deploy.sh sets WORKER_ENV_FILE=.env.deploy.<type>) so
# the standard and service-detection deploys never share/clobber one file.
env_file:
- ${WORKER_ENV_FILE:-.env.deploy}
environment:
- WORKER_TYPE=vps
entrypoint: ["/bin/sh", "-c", "WORKER_NAME=$(hostname) exec docker-entrypoint.sh node dist/entrypoint.js"]How the worker gets its config
The script only writes BACKEND_URL + BACKEND_SECRET into an ephemeral, chmod 600.env.deploy. The worker container itself calls the bootstrap endpoint at start to fetch its RabbitMQ / MinIO credentials — secrets are never baked into an image or committed.