Skip to content

Architecture

Sonar turns a description of a scan into structured knowledge about a target. You author a workflow once; Sonar runs it across many machines, gathers the output, and folds it into a queryable asset database — reliably, and without you babysitting it.

Vocabulary (used throughout)

  • Target / Program — the thing you're scanning: a bug-bounty program (e.g. a HackerOne handle). "Target" and "program" mean the same thing here.
  • Scope — a sub-target within a program (e.g. *.acme.com) that says what's in-bounds.
  • Asset — anything a scan discovers about a target: a domain, IP, port, HTTP path, or detected technology. Assets are the rows in the database. See Asset model.
  • Workflow — a reusable recipe: a DAG (directed acyclic graph — steps with "runs-after" edges and no loops) of scanning steps. A scan is one run of it.
  • Reconciler — the background loop that every few seconds looks at a running scan and enqueues the next ready work. It's the engine; details on Reliability.

The moving parts

        author / operate

        ┌─────▼──────┐        ┌──────────────┐
        │  Sonar API │◀──────▶│  PostgreSQL  │   programs, scopes, assets,
        │  (backend) │        │              │   workflows, scans, tasks
        └─────┬──────┘        └──────────────┘
              │ enqueue task
        ┌─────▼──────┐
        │  RabbitMQ  │  message queue (task dispatch)
        └─────┬──────┘
              │ pull
        ┌─────▼──────┐        ┌──────────────┐
        │  Workers   │───────▶│    MinIO     │  raw tool output (files)
        │ (a fleet)  │        │  (S3-like)   │
        └─────┬──────┘        └──────┬───────┘
              │ result webhook        │ read output
        ┌─────▼───────────────────────▼───────┐
        │  Sonar API: parse + bulk-upsert into │
        │  the asset tables (domains, ports…)  │
        └──────────────────────────────────────┘
  • Backend (Sonar API) — the brain. A .NET service that stores everything in PostgreSQL, exposes the REST API, and runs the background reconciler (the loop that drives each scan forward one step at a time). It serves the frontend too.
  • RabbitMQ — the task queue. The backend enqueues one message per unit of work; workers pull from it. Decoupling dispatch from execution is what lets the fleet scale.
  • Workers — disposable machines (VPS, GitHub runners, AWS) that pull a task, run a shell command (a scanning tool), upload the output, and report back. They hold no state.
  • MinIO — S3-compatible object storage for the raw tool output (often large files). The DB stores structured facts; MinIO stores the artifacts behind them.
  • Observability — OpenTelemetry → Prometheus / Loki / Tempo / Grafana for metrics, logs, and traces across the whole flow.

The lifecycle of a scan, in one breath

  1. You define a workflow (a DAG of steps) and start a scan of it, optionally scoped to one target.
  2. The backend captures a snapshot of the workflow (for faithful history/display — see the caveat) and marks the scan Running.
  3. Every few seconds the reconciler looks at the scan, works out which steps are ready, and enqueues tasks to RabbitMQ within the fleet's capacity.
  4. Workers pull tasks, run the tool, upload output to MinIO, and POST a result back.
  5. The backend parses the output and bulk-upserts it into the asset tables; that in turn unlocks downstream steps.
  6. When every step is done, the scan is Completed (or CompletedWithErrors).

Each of these has a mechanism that makes it safe to interrupt, retry, and re-run — that's the subject of Reliability & consistency. First, the piece you touch most: the workflow engine.