/prompt-tune
Improve a prompt scientifically, not by vibes.
Usage
/prompt-tune src/lib/prompts/extractor.ts
/prompt-tune --interactive # paste a prompt + eval cases inline
Workflow
1. Define the eval set
Either provide:
- 5-15 input/expected-output pairs (JSONL)
- A scoring criterion ("does the output contain a valid JSON object with field X?")
2. Baseline run
- Run the current prompt against all cases
- Record pass/fail + cost per call
3. Generate variants
Create 3-5 modified prompts using:
- Better instruction phrasing
- Few-shot examples
- Chain-of-thought prefix
- Output format constraints
- System role tightening
4. Compare
Side-by-side: pass rate, cost, latency for each variant.
5. Recommend
Surface the variant with the best pass-rate-to-cost ratio. Show the diff vs baseline.
Output
Baseline: 7/12 passing · $0.012/call · avg 1.8s
Variant 2 (added few-shot): 11/12 passing · $0.014/call · avg 2.1s ✓ best
Variant 3 (stricter system): 9/12 passing · $0.011/call · avg 1.6s
Then writes the winning variant back to the file with a // tuned 2026-04-25 comment.
Rules
- Use
claude-haiku-4-5-20251001for tuning runs unless quality demands Sonnet - Cap evaluation at 50 runs per session to control cost
- Save the eval set as
<prompt-file>.eval.jsonlfor future regression checks