Skip to main content
mirra run <scenario.md> [flags]
Reads a markdown scenario, provisions the mirrors it needs, executes the scenario (either via a driver or by letting a coding agent run against the session), and evaluates the success criteria. The canonical entry point for CI, pre-merge gates, and agent validation runs.

Common invocations

mirra run scenarios/welcome-email.md

How a run works

┌─────────────────────────────────────────────────┐
│  1. mirra run reads scenario.md                 │
│  2. provisions session with declared mirrors    │
│  3. loads seed (fixture / LLM / json)           │
│  4. drives the scenario:                        │
│      · test-suite mode  → runs the test        │
│      · agent mode       → hands prompt to agent │
│  5. captures complete trace                     │
│  6. evaluator grades every `check:` criterion   │
│  7. LLM-judge grades every `judge:` criterion   │
│  8. emits satisfaction score + per-criterion    │
│  9. tears down (ephemeral) or holds (persistent)│
└─────────────────────────────────────────────────┘

Output

Human-readable default:
$ mirra run scenarios/welcome-email.md --runs=3

→ provisioning resend mirror…
✓ session ses_a7k2 ready in 1.8s

→ run 1/3
  ✓ Exactly 1 email is created in the Resend mirror
  ✓ The email's from address is hello@mail.acme.com
  ✓ At least 1 email.sent webhook was received
  ✓ The webhook signature was verified
  ✓ The email subject is appropriate for a welcome message
  ✗ Error handling for bounces is implemented

→ run 2/3
  ✓ …


satisfaction score: 87% (15/18 criteria passed across 3 runs)

per-criterion breakdown:
  ✓ Exactly 1 email is created in the Resend mirror         3/3
  ✓ The email's from address is hello@mail.acme.com          3/3
  ✓ At least 1 email.sent webhook was received               3/3
  ✓ The webhook signature was verified                       3/3
  ✓ The email subject and body appropriate for a welcome     3/3
  ✗ Error handling for bounces is implemented                0/3

dashboard: https://app.mirra.run/runs/run_9xkp2m
Machine-readable with --json:
{
  "runId": "run_9xkp2m",
  "scenario": "scenarios/welcome-email.md",
  "sessionId": "ses_a7k2",
  "runs": 3,
  "satisfactionScore": 0.87,
  "criteria": [
    { "label": "Exactly 1 email is created in the Resend mirror", "kind": "check", "passed": 3, "total": 3 },
    { "label": "Error handling for bounces is implemented",        "kind": "judge", "passed": 0, "total": 3 }
  ],
  "durationMs": 14812,
  "dashboardUrl": "https://app.mirra.run/runs/run_9xkp2m"
}

Flags

--runs
integer
default:"1"
Execute the scenario N times in a row. Between runs, all mirrors in the session are reset to their seeded state. The final score is the mean percentage of criteria passed across runs.
--session
string
Run against an existing session instead of provisioning a new one. The session must have all mirrors the scenario requires. Use this when you want multiple scenarios to share warm mirrors.
--agent
enum
Name the agent driving the scenario. Options: claude-code, cursor, copilot, cline, custom. Affects how Mirra hands the prompt and how the run is logged. Default is custom (you’re running a test driver, not an agent).
--fixture
string
Override the scenario’s declared fixture for this run. Same format as in the scenario file: resend:transactional-busy.
--timeout
duration
default:"60s"
Per-run timeout. If the run exceeds this, it’s terminated and counted as a failure. Lift with longer scenarios or slow agents.
--fail-below
float
Exit non-zero if the satisfaction score is below the given threshold (0.0–1.0). Use this to gate CI: --fail-below=0.9 means CI fails if fewer than 90% of criteria pass.
--json
boolean
default:"false"
Emit the final result as one JSON line on stdout. Combine with --quiet to suppress the per-run log.
--quiet
boolean
default:"false"
Suppress live progress output. The final summary still prints (or just the JSON line if --json).

Exit codes

CodeMeaning
0Every criterion passed (or passed above --fail-below).
1Satisfaction score below --fail-below.
2A run timed out.
3A mirror could not be provisioned.
4Scenario file was invalid.
64CLI usage error (unknown flag, missing argument).

Driving the scenario

mirra run doesn’t execute code on its own. It needs a driver that turns the scenario’s ## Prompt into actual work against the session. Three common drivers:
Hand the prompt to a coding agent via the MCP server. The agent reads mirror state, fires requests, and writes code that operates against the session. Used for agent evaluation.
See Guide — First scenario for an end-to-end walkthrough.

Where to go next

Scenario format

Every valid section, every config key, every edge case in a scenario.

CI integration

Wire mirra run into GitHub Actions or your CI of choice.