Commit Graph

10 Commits

Author SHA1 Message Date
Pedro Rodrigues
3c3d1f55ca containerize eval environment with Docker and mock CLIs
Host now only needs Docker + ANTHROPIC_API_KEY to run evals. Adds
multi-stage Dockerfile, mock supabase/docker/psql scripts, entrypoint,
docker-compose for local use, and switches CI to Docker-based execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:22:47 +00:00
Pedro Rodrigues
93a49374de realtime scenario 2026-02-23 10:25:50 +00:00
Pedro Rodrigues
baf94b04e3 load skills through skills CLI 2026-02-20 17:41:41 +00:00
Pedro Rodrigues
ce7eb8b28b simple edge function creation example 2026-02-20 16:54:01 +00:00
Pedro Rodrigues
386b9fbb05 storage workflow 2026-02-20 15:11:07 +00:00
Pedro Rodrigues
e03bc99ebb more two scenarios and claude code cli is now a dependency 2026-02-20 15:02:59 +00:00
Pedro Rodrigues
9a23c6b021 upgrade braintrust to ~v3.0.0 2026-02-19 17:14:27 +00:00
Pedro Rodrigues
e06a567846 workflow evals with one scenario 2026-02-19 17:06:17 +00:00
Pedro Rodrigues
082eac2a01 multi model testing 2026-02-18 13:28:42 +00:00
Pedro Rodrigues
27d7af255d initial skills evals 2026-02-18 12:02:28 +00:00