Skip to content

The workflow engine

This is the heart of Sonar. If you understand this page, you understand how scans work.

Workflow vs. scan

  • A WorkflowDefinition is a reusable template — a recipe. It contains steps and the wiring between them. It runs nothing on its own.
  • A Scan is one execution of a workflow. Starting a scan is a single action: it immediately goes Running and the engine takes over.

Think of the workflow as a class and the scan as an instance.

What a workflow contains

A workflow is a DAG (directed acyclic graph) of steps. Each step runs exactly one shell command on a worker.

  subfinder ─┐
  assetfinder├──▶  merge & unique  ──▶  transform → save to DB (domains)
  chaos     ─┘        (fan-in)              (per-item)

A step carries everything needed to run and route one command:

FieldWhat it does
commandThe shell command template, e.g. subfinder -d {PARAM.domain} -o {OUTPUT_FILE}.
variablesNamed paths/values the command uses (OUTPUT_FILE, INPUT_FILE, …).
executionLocation / workerImage / targetTagsRouting — which kind of worker runs it (VPS, GitHub, AWS; standard vs service-detection; tag match).
maxConcurrentA ceiling on how many of this step's tasks run at once.
inputSource / inputSqlOptionally feed the command a CSV/JSON of rows from the DB (e.g. "all in-scope domains").
saveToDb / outputTableIf set, the step's parsed output is upserted into that asset table.
dependsOnThe edges — which upstream steps must run first, and how (see below).

Where a step runs (routing). A step doesn't pick a machine directly; it advertises requirements and the engine matches a suitable worker:

  • executionLocation — the kind of host: Vps, GitHub, Aws, or Anywhere.
  • workerImage — which toolbox the worker has: Standard (recon tools like subfinder, puredns, httpx) or ServiceDetection (nmap / service-fingerprinting).
  • targetTags — routing labels (e.g. ["vps.standard"]). A worker announces the tags it serves; a step runs on a worker whose tags include one of the step's targetTags.

Secrets and parameters are declared inputs. A Parameter ({PARAM.x}) is a visible targeting value (e.g. a wildcard to scan); a Secret ({SECRETS.x}) is masked and only substituted into the command. A scan supplies concrete values for these at start time.

How one step's output feeds the next. A step writes its result to the path in its {OUTPUT_FILE} variable (or emits it on stdout as {OUTPUT}). When a downstream step depends on it via a Single edge, the engine hands that file to the child as {INPUT_FILE_UPSTREAM} (or the value as {INPUT_UPSTREAM}). So the wiring is: upstream writes {OUTPUT_FILE} → engine → downstream reads {INPUT_FILE_UPSTREAM}. See Variable replacement for the full list.

The two dependency types (this is the key idea)

An edge from step B back to step A says "B depends on A." How B consumes A comes in two flavors:

  • All (fan-in) — B runs once, after every task of A has finished. Use it to merge: "wait for subfinder and assetfinder and chaos, then dedupe."
  • Single (streaming fan-out) — B runs once per finished task of A, created as each one completes. Use it to process items in parallel: "for each host A found, run a port scan." The upstream step must emit an OUTPUT / OUTPUT_FILE that becomes B's input.

Single is what lets a scan stream: downstream work starts the moment the first upstream item is ready, instead of waiting for the whole stage. Combined with per-file/per-line expansion, one crawl step can fan out into thousands of parallel probes.

Running a scan

Starting a scan (POST /api/scans, or the create_scan MCP tool):

  1. Loads the workflow, validates the DAG (no cycles, dependency rules — see Reliability).
  2. Resolves the parameters you passed (rejects missing-required / unknown names).
  3. Captures a WorkflowSnapshot — a full copy of the steps/commands/edges as they are right now, stored on the scan (jsonb). It gives every scan a faithful record of the graph it was launched with, so its history and DAG always render correctly even after you edit or delete the workflow. (See the note on what the snapshot does and doesn't freeze.)
  4. Inserts the scan as Running. From here the reconciler drives it (next section).

What you watch

  • get_scan / GET /api/scans/{id} — overall status, which steps are done vs running, and a rough ETA.
  • get_scan_tasks_statistics — per-step task counts (pending / running / completed / failed) and the last error.
  • A scan finishes as Completed or, if some tasks failed, CompletedWithErrors — it still completes; failures don't wedge the graph.

Next: the mechanisms that make all of this reliable and consistent.