Pedro Rodrigues
3c3d1f55ca
containerize eval environment with Docker and mock CLIs
...
Host now only needs Docker + ANTHROPIC_API_KEY to run evals. Adds
multi-stage Dockerfile, mock supabase/docker/psql scripts, entrypoint,
docker-compose for local use, and switches CI to Docker-based execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-23 19:22:47 +00:00
Pedro Rodrigues
93a49374de
realtime scenario
2026-02-23 10:25:50 +00:00
Pedro Rodrigues
baf94b04e3
load skills through skills CLI
2026-02-20 17:41:41 +00:00
Pedro Rodrigues
ce7eb8b28b
simple edge function creation example
2026-02-20 16:54:01 +00:00
Pedro Rodrigues
386b9fbb05
storage workflow
2026-02-20 15:11:07 +00:00
Pedro Rodrigues
e03bc99ebb
more two scenarios and claude code cli is now a dependency
2026-02-20 15:02:59 +00:00
Pedro Rodrigues
9a23c6b021
upgrade braintrust to ~v3.0.0
2026-02-19 17:14:27 +00:00
Pedro Rodrigues
e06a567846
workflow evals with one scenario
2026-02-19 17:06:17 +00:00
Pedro Rodrigues
082eac2a01
multi model testing
2026-02-18 13:28:42 +00:00
Pedro Rodrigues
27d7af255d
initial skills evals
2026-02-18 12:02:28 +00:00