Start a shared local Supabase stack once before all scenarios and reset
the database (drop/recreate public schema + clear migration history) between
each run. This lets agents apply migrations via `supabase db push` against a
real Postgres instance instead of mock shell scripts.
- Add supabase-setup.ts: startSupabase / stopSupabase / resetDB / getKeys
- Update runner.ts to start/stop Supabase and inject keys into process.env
- Update agent.ts to point MCP config at the local Supabase HTTP endpoint
- Update preflight.ts to check supabase CLI availability and Docker socket
- Update scaffold.ts to seed workspace with supabase/config.toml
- Add passThreshold support (test.ts / results.ts / types.ts) for partial pass
- Delete mock shell scripts (mocks/docker, mocks/psql, mocks/supabase)
- Update Dockerfile/docker-compose to mount Docker socket for supabase CLI
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add per-test pass/fail parsing from vitest verbose output, thread prompt
content and individual test results through the runner, and rewrite
uploadToBraintrust with experiment naming (model-variant-timestamp),
granular scores (pass, test_pass_rate, per-test), rich metadata, and
tool-call tracing via experiment.traced(). Also document --force flag
for cached mise tasks and add Braintrust env vars to AGENTS.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>