mirror of
https://github.com/supabase/agent-skills.git
synced 2026-03-27 10:09:26 +08:00
2.4 KiB
2.4 KiB
title, impact, impactDescription, tags
| title | impact | impactDescription | tags |
|---|---|---|---|
| Optimize Vector Search Performance | CRITICAL | 2-10x latency reduction with proper tuning | performance, pre-warming, compute, batch, monitoring |
Optimize Vector Search Performance
Vector search is RAM-bound. Proper tuning and compute sizing are critical.
1. Undersized Compute for Vector Workload
Vector indexes must fit in RAM for optimal performance.
Incorrect:
-- Free tier (0.5GB RAM) with 100K 1536-dim vectors
-- Symptoms: high disk reads, slow queries
select count(*) from documents; -- Returns 100000
Correct:
-- Check buffer cache hit ratio
select round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 2) as hit_ratio
from pg_statio_user_tables where relname = 'documents';
-- If < 95%, upgrade compute or reduce data
2. Building Index During Peak Traffic
Non-concurrent index builds lock the table.
Incorrect:
-- Locks table, impacts all queries
create index on documents using hnsw (embedding vector_cosine_ops);
Correct:
-- No lock, runs in background
create index concurrently on documents using hnsw (embedding vector_cosine_ops);
Compute Sizing
Approximate capacity for 1536-dimension vectors with HNSW index:
| Plan | RAM | Vectors (1536d) |
|---|---|---|
| Nano (Free) | 0.5GB | Limited — index may swap |
| Micro | 1GB | ~15K |
| Small | 2GB | ~50K |
| Medium | 4GB | ~100K |
| Large | 8GB | ~225K |
See the compute sizing guide for detailed benchmarks.
Index Pre-Warming
-- Load index into memory before production traffic
select pg_prewarm('documents_embedding_idx');
-- Run 10K-50K warm-up queries before benchmarking
Index Build Settings
set maintenance_work_mem = '4GB';
set max_parallel_maintenance_workers = 4;
set statement_timeout = '0';
Query Monitoring
-- Find slow vector queries
select substring(query, 1, 80), calls, round(mean_exec_time::numeric, 2) as avg_ms
from pg_stat_statements
where query like '%<=>%' or query like '%<#>%'
order by total_exec_time desc limit 10;
Related
- index-hnsw.md - HNSW parameters
- Docs - Production guide
- Docs - Compute sizing