initial skills evals

This commit is contained in:
Pedro Rodrigues
2026-02-18 12:02:28 +00:00
parent 69575f4c87
commit 27d7af255d
17 changed files with 3177 additions and 10 deletions

View File

@@ -25,6 +25,16 @@ skills/
packages/
skills-build/ # Generic build system for all skills
evals/ # LLM evaluation system for skills
AGENTS.md # Agent guide for developing evals
CLAUDE.md # Symlink to AGENTS.md
scenarios/
workflow-scenarios.json # Handwritten workflow test scenarios
src/
cli.ts # Entry point
prompts/ # Eval and judge prompts
scorer/ # Zod schemas and judge execution
dataset/ # Test case extraction from skill references
runner/ # Eval orchestrator and runners
```
## Commands