/prompt-tune

Improve a prompt scientifically, not by vibes.

Usage

/prompt-tune src/lib/prompts/extractor.ts
/prompt-tune --interactive    # paste a prompt + eval cases inline

Workflow

1. Define the eval set

Either provide:

5-15 input/expected-output pairs (JSONL)
A scoring criterion ("does the output contain a valid JSON object with field X?")

2. Baseline run

Run the current prompt against all cases
Record pass/fail + cost per call

3. Generate variants

Create 3-5 modified prompts using:

Better instruction phrasing
Few-shot examples
Chain-of-thought prefix
Output format constraints
System role tightening

4. Compare

Side-by-side: pass rate, cost, latency for each variant.

5. Recommend

Surface the variant with the best pass-rate-to-cost ratio. Show the diff vs baseline.

Output

Baseline: 7/12 passing · $0.012/call · avg 1.8s
Variant 2 (added few-shot):  11/12 passing · $0.014/call · avg 2.1s  ✓ best
Variant 3 (stricter system):  9/12 passing · $0.011/call · avg 1.6s

Then writes the winning variant back to the file with a // tuned 2026-04-25 comment.

Rules

Use claude-haiku-4-5-20251001 for tuning runs unless quality demands Sonnet
Cap evaluation at 50 runs per session to control cost
Save the eval set as <prompt-file>.eval.jsonl for future regression checks

Prompt Tune

Install this skill