Claude Told Me to Use a Cheaper Claude for the Boring Parts

Give it a VIN, get back everything that's wrong with the car. Like Carfax, but with AI doing the research instead of an insurance database.

The architectural insight that made the whole thing commercially viable: use the expensive AI for thinking and the cheap AI for reading. Opus for everything costs $15-20 per report. The tiered approach: under $5. Sell for $7.99. Not great margins, but better than negative margins.

The Two-Claude Architecture

Stage 1 (Haiku):  Vehicle Identity    → Decode VIN, map generation, identify forums
Stage 2 (Haiku):  Official Sources    → NHTSA complaints, recalls, TSBs
Stage 3 (Haiku):  Community Research  → Forum threads, Reddit, owner reports
Stage 4 (Opus):   Synthesis           → Correlate findings, assign confidence, generate report

Stages 1-3 are data extraction: pattern matching, not reasoning. Haiku is fast, cheap, and perfect for it. Stage 4 is judgment: Is this forum complaint about the same issue as that NHTSA recall? Does this Reddit post about "weird transmission behavior at 30K" match the TSB about harsh shifting? Should the confidence be "high" (3+ independent sources) or "moderate" (2 sources, different contexts)?

That's where you pay for the expensive model.

For a 10th-gen Honda Civic, Haiku extracted hundreds of NHTSA complaints and forum threads. Opus synthesized it into a report with confidence ratings and specific inspection advice. Using Opus for all four stages would have cost 3-4x more and been no better at the extraction work.

Version Your Prompts

This project introduced something I used on every project after:

.claude/
├── agents/
│   ├── vehicle-researcher.md
│   └── vehicle-synthesizer.md
└── prompts/
    ├── research-stage1-identity.md
    ├── research-stage2-official.md
    ├── research-stage3-community.md
    └── research-stage4-synthesis.md

Each stage has its own prompt file. The researcher handles stages 1-3 (what to look for, where to look, how to structure findings). The synthesizer handles stage 4 (how to correlate, what confidence thresholds to apply).

Run the same prompts on a different vehicle, get the same structured output. The pipeline is reproducible because the prompts are version-controlled.

Yes, this is obvious. Version-control your prompts like you version-control your code. I didn't do it for the first 8 projects. Ad-hoc prompting produces ad-hoc results. Versioned prompts produce consistent results. Obvious advice I ignored until I couldn't.

The Takeaway

Cheap models for extraction, expensive models for reasoning. Versioned prompts for reproducibility. These sound obvious written down. I arrived at both after building it wrong first: Opus for everything ($15-20/report, negative margins) and ad-hoc prompts that produced different results every run.