Commit Graph

28 Commits

Author SHA1 Message Date
Pedro Rodrigues
e65642b752 remove some braintrust headers 2026-02-25 19:11:56 +00:00
Pedro Rodrigues
9b08864e94 feat(evals): replace mock CLIs with real Supabase instance per eval run
Start a shared local Supabase stack once before all scenarios and reset
the database (drop/recreate public schema + clear migration history) between
each run. This lets agents apply migrations via `supabase db push` against a
real Postgres instance instead of mock shell scripts.

- Add supabase-setup.ts: startSupabase / stopSupabase / resetDB / getKeys
- Update runner.ts to start/stop Supabase and inject keys into process.env
- Update agent.ts to point MCP config at the local Supabase HTTP endpoint
- Update preflight.ts to check supabase CLI availability and Docker socket
- Update scaffold.ts to seed workspace with supabase/config.toml
- Add passThreshold support (test.ts / results.ts / types.ts) for partial pass
- Delete mock shell scripts (mocks/docker, mocks/psql, mocks/supabase)
- Update Dockerfile/docker-compose to mount Docker socket for supabase CLI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 14:39:54 +00:00
Pedro Rodrigues
2da5cae2ac feat(evals): enrich Braintrust upload with granular scores and tracing
Add per-test pass/fail parsing from vitest verbose output, thread prompt
content and individual test results through the runner, and rewrite
uploadToBraintrust with experiment naming (model-variant-timestamp),
granular scores (pass, test_pass_rate, per-test), rich metadata, and
tool-call tracing via experiment.traced(). Also document --force flag
for cached mise tasks and add Braintrust env vars to AGENTS.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 13:26:48 +00:00
Pedro Rodrigues
3c3d1f55ca containerize eval environment with Docker and mock CLIs
Host now only needs Docker + ANTHROPIC_API_KEY to run evals. Adds
multi-stage Dockerfile, mock supabase/docker/psql scripts, entrypoint,
docker-compose for local use, and switches CI to Docker-based execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:22:47 +00:00
Pedro Rodrigues
93a49374de realtime scenario 2026-02-23 10:25:50 +00:00
Pedro Rodrigues
baf94b04e3 load skills through skills CLI 2026-02-20 17:41:41 +00:00
Pedro Rodrigues
ce7eb8b28b simple edge function creation example 2026-02-20 16:54:01 +00:00
Pedro Rodrigues
386b9fbb05 storage workflow 2026-02-20 15:11:07 +00:00
Pedro Rodrigues
e03bc99ebb more two scenarios and claude code cli is now a dependency 2026-02-20 15:02:59 +00:00
Pedro Rodrigues
9a23c6b021 upgrade braintrust to ~v3.0.0 2026-02-19 17:14:27 +00:00
Pedro Rodrigues
e06a567846 workflow evals with one scenario 2026-02-19 17:06:17 +00:00
Pedro Rodrigues
082eac2a01 multi model testing 2026-02-18 13:28:42 +00:00
Pedro Rodrigues
27d7af255d initial skills evals 2026-02-18 12:02:28 +00:00
Pedro Rodrigues
69575f4c87 remove impact and impactDescription from supabase skills frontmatter header 2026-02-17 12:46:21 +00:00
Pedro Rodrigues
c6f5a2bec0 chore: build system skill body (#42)
* feat: extract SKILL.md body into AGENTS.md with H1 title and Overview section

Build system now parses SKILL.md body to extract H1 heading as the AGENTS.md
title and places remaining content under an Overview section. Adds validation
that SKILL.md body starts with H1, directory name is kebab-case, and name
field matches directory name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: AGENTS.md is now SKILL.md body with frontmatter stripped

Build now generates AGENTS.md by extracting the SKILL.md markdown body
(everything after YAML frontmatter). CLAUDE.md remains a symlink to
AGENTS.md. Removes content generation logic (Structure, Usage, Overview,
Reference Categories, Available References sections) — SKILL.md is the
single source of truth for agent instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add Structure and Usage sections to AGENTS.md, validate H1 title matches directory name

Build now generates AGENTS.md as: H1 Title > Structure > Usage > rest of
SKILL.md body. Validates that SKILL.md body starts with H1 heading and
that the title in kebab-case matches the directory name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:14:49 +00:00
Pedro Rodrigues
22e466937a fix: require reference files at root of references/ directory (#32)
Remove support for nested subdirectories in references/.
All markdown reference files must now be placed directly in
the references/ directory (e.g., references/auth-signup.md).

- Replace getMarkdownFilesRecursive with getMarkdownFiles (flat)
- Simplify parseAllSections to only read root _sections.md
- Update getReferenceFiles to skip subdirectories
- Keep deprecated getMarkdownFilesRecursive alias for compatibility

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 13:39:59 +00:00
Pedro Rodrigues
a3b815155c reduxe the agents.md file size (#20) 2026-01-27 21:38:29 +00:00
Pedro Rodrigues
5da9a5ee37 feat: add subdirectory support for reference files (#18)
- Add getMarkdownFilesRecursive() for recursive file scanning
- Add parseAllSections() to parse _sections.md from subdirectories
- Add parseSectionsFromFile() helper function
- Update buildSkill() and validateSkill() to use new functions
- Export new functions for use by validate.ts

This allows organizing reference files in product-specific subdirectories
(e.g., references/db/ for database files) while keeping _sections.md in
each subdirectory.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 18:34:52 +00:00
Pedro Rodrigues
f421451c79 chore: comply with agent skills open stardard (#14) 2026-01-26 15:22:35 +00:00
Pedro Rodrigues
bbde7ff5f8 refactor: generic skills build system with auto-discovery (#8)
* refactor: generic skills build system with auto-discovery

- Rename postgres-best-practices-build → skills-build
- Add auto-discovery: scans skills/ for subdirectories with metadata.json
- Build/validate all skills or specific skill with -- argument
- Update root AGENTS.md and CONTRIBUTING.md with new structure
- No configuration needed to add new skills

Usage:
  npm run build                    # Build all skills
  npm run build -- skill-name      # Build specific skill
  npm run validate                 # Validate all skills

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix ci

* more generic impact levels

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 15:56:11 +00:00
Pedro Rodrigues
7b1a65007b automatically reorder sections on agents.md by priority 2026-01-22 08:48:47 +00:00
Pedro Rodrigues
f323d3b601 Add Biome formatter/linter and restore CI workflow (#6)
- Install Biome as the project formatter and linter
- Configure Biome with recommended settings
- Add format, lint, and check scripts to package.json
- Restore CI workflow from git history (commit 0a543e1)
- Extend CI with new Biome job for format and lint checks
- Apply Biome formatting to all TypeScript files
- Fix linting issues (use node: protocol, template literals, forEach pattern)

CI now runs on:
- All pushes to main branch
- All pull requests

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 08:28:49 +00:00
Pedro Rodrigues
221215b707 final review 2026-01-21 16:53:34 +00:00
Pedro Rodrigues
663d784e24 remove extract tests 2026-01-21 16:44:38 +00:00
Pedro Rodrigues
c63a5fa509 second reviewe 2026-01-21 16:32:52 +00:00
Pedro Rodrigues
8df22b058d chore: bump version to 1.0.0 (#4)
Update version from 0.1.0 to 1.0.0 across all files:
- skills/postgres-best-practices/AGENTS.md
- skills/postgres-best-practices/SKILL.md
- skills/postgres-best-practices/metadata.json
- packages/postgres-best-practices-build/package.json
- packages/postgres-best-practices-build/package-lock.json
- packages/postgres-best-practices-build/src/build.ts

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 15:40:53 +00:00
Pedro Rodrigues
06a4e68d30 Rename postgresql to postgres (keeping only existing files) 2026-01-19 19:31:34 +00:00
Pedro Rodrigues
0a543e1b4a Initial setup: PostgreSQL best practices repository
Skeleton structure for Supabase PostgreSQL experts to add performance
optimization rules. Modeled after Vercel's react-best-practices-build.

Includes:
- Build system (parser, validator, builder)
- Skill manifest and metadata
- Rule templates and writing guidelines
- CI workflow for validation
- Getting started guide for Postgres team

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 09:52:32 +07:00