supabase-postgres-best-practices

mirror of https://github.com/supabase/agent-skills.git synced 2026-03-27 10:09:26 +08:00

Author	SHA1	Message	Date
Pedro Rodrigues	2da5cae2ac	feat(evals): enrich Braintrust upload with granular scores and tracing Add per-test pass/fail parsing from vitest verbose output, thread prompt content and individual test results through the runner, and rewrite uploadToBraintrust with experiment naming (model-variant-timestamp), granular scores (pass, test_pass_rate, per-test), rich metadata, and tool-call tracing via experiment.traced(). Also document --force flag for cached mise tasks and add Braintrust env vars to AGENTS.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 13:26:48 +00:00
Pedro Rodrigues	3c3d1f55ca	containerize eval environment with Docker and mock CLIs Host now only needs Docker + ANTHROPIC_API_KEY to run evals. Adds multi-stage Dockerfile, mock supabase/docker/psql scripts, entrypoint, docker-compose for local use, and switches CI to Docker-based execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:22:47 +00:00
Pedro Rodrigues	93a49374de	realtime scenario	2026-02-23 10:25:50 +00:00
Pedro Rodrigues	baf94b04e3	load skills through skills CLI	2026-02-20 17:41:41 +00:00
Pedro Rodrigues	ce7eb8b28b	simple edge function creation example	2026-02-20 16:54:01 +00:00
Pedro Rodrigues	386b9fbb05	storage workflow	2026-02-20 15:11:07 +00:00
Pedro Rodrigues	e03bc99ebb	more two scenarios and claude code cli is now a dependency	2026-02-20 15:02:59 +00:00
Pedro Rodrigues	9a23c6b021	upgrade braintrust to ~v3.0.0	2026-02-19 17:14:27 +00:00
Pedro Rodrigues	e06a567846	workflow evals with one scenario	2026-02-19 17:06:17 +00:00
Pedro Rodrigues	082eac2a01	multi model testing	2026-02-18 13:28:42 +00:00
Pedro Rodrigues	27d7af255d	initial skills evals	2026-02-18 12:02:28 +00:00

11 Commits