Speed vs Smarts. Sonnet Executes. Opus Architects. The Cost Math Decides.
Sonnet 4.6 costs 5x less than Opus 4.7. Sonnet handles 80% of tasks — code gen, refactors, debugging, test writing, docs. Opus 1M-context wins when you need 50+ files in scope, ambiguous briefs, or a single wrong line breaks prod. The heuristic is clean: Sonnet for execution (write the code), Opus for architecture (design the system). Once you internalise the cost trade-off — $3 input / $15 output for Sonnet vs $15 / $75 for Opus — routing becomes mechanical. Start every task in Sonnet. Escalate to Opus when you hit ceiling. This is the production strategy that works.
The Real Cost Math
Opus 4.7 with 1M context hits $15 per million input tokens, $75 per million output. Sonnet 4.6 is $3 input, $15 output. A typical 10k-token refactor costs you $0.15 in Sonnet, $0.30 in Opus. The delta compounds fast. Run 100 refactors per week — you're spending $15/week on Sonnet, $30/week on Opus. At annual scale, that's $780 vs $1,560. But the story isn't about penny-pinching. It's about whether the extra $780 buys you material quality gain. Usually it doesn't.
Sonnet's ceiling is single-file or shallow-dependency work. A 15-file refactor where you're renaming a variable across all of them? Sonnet can handle 3–4 files at a time in context, then you iterate. Takes 20 minutes with context-aware iteration. Opus can ingest all 15 at once, reason about the coupling, and hand you a coherent plan in 2 minutes. The time savings matter when you're rearchitecting. The cost delta ($0.45 vs $2.25 for that task) doesn't. Choose based on scope, not budget paranoia.
Four Task Categories: Verdict Per Class
Refactors (variable rename, extract function, folder structure)
Sonnet wins. You're usually touching 1–4 files. Sonnet sees the delta, suggests the change. If it misses a reference, you ask it to find imports. 3–4 bounces, done. Cost: $0.10–0.50. Opus is overkill unless the refactor spans 20+ files with hidden dependencies. Even then, the speed gain (Opus sees it in one pass) is maybe 5 minutes, and the code quality is identical. Verdict: **Sonnet default.**
Code generation (new feature, new component, new endpoint)
Sonnet by default, Opus on ambiguity. If you brief "build a React form with validation using Zod", Sonnet ships it correctly 85% of the time. If you brief "build a form that also syncs to Supabase, handles real-time collaboration, and integrates with our permission system", and you haven't fully designed the permission layer, Opus is worth the cost because it will ask clarifying questions and propose the architecture before writing a line. Sonnet will write a form and hope it fits. Verdict: **Sonnet for specs, Opus for speculation.**
Debugging (why is the CI failing, why does the button not click)
Sonnet usually closes bugs in 1–2 bounces. "The login flow is hanging. Here's the error." You paste the stack trace, Sonnet narrows to the async logic, finds the missing await. If the bug is architectural (race condition across 3 systems, timing-dependent failure that needs deep context of the whole flow), Opus is worth summoning. But 95% of prod bugs are shallow — wrong API endpoint, missing environment variable, off-by-one in a loop. Sonnet finds those. Verdict: **Sonnet default, Opus on inexplicable failures.**
Planning (design a system, spec multi-service architecture)
Opus. You're not writing code; you're reasoning about 10–20 loosely-coupled decisions. Authentication strategy, database schema, event flow, cache invalidation, deployment topology. Sonnet can do it, but it thinks linearly — each suggestion stands alone. Opus reasons about trade-offs holistically: "if we choose Redis for the cache, the eviction policy affects the refresh strategy, which affects the schema design, which affects the permission layer." That coherence is worth 5x cost when you're 3 weeks from shipping. A $10–15 planning chat saves $50k in rework. Verdict: **Opus always.**
When 1M Context Actually Matters
Opus's killer feature is the 1M token window. But you won't hit it often. A typical codebase file (300–500 lines) is 3k–4k tokens. Even a hefty 50-file project runs 150k–200k tokens. You're inside Sonnet's context window most of the time. The 1M window matters when: (1) you're porting a 30k-line monolith and need every file visible at once to find all coupling points, (2) you're building a system from a 100-page spec document and you want the entire brief in context so nothing gets misread, (3) you're auditing a codebase for security and you need to see call graphs across the whole tree at once. These are rare. Maybe 5–10% of tasks. Verdict: **Don't assume 1M is default. It's a bonus you unlock on edge cases.**
The Routing Strategy
Start every task in Sonnet. Open a fresh chat, paste the brief, let it cook. If the response is vague, contradictory, or it says "I don't have enough context", escalate to Opus. If it nails it, ship it. The boundary is usually obvious after 2–3 interactions — you know whether Sonnet is hitting a wall or cruising. Don't get precious about it. Switching to Opus is a single click; the cost delta per task is negligible. What matters is shipping on time with correct code. Sonnet will do that 80% of the time. Opus will do it 99% of the time. Pay for the nine when it matters. Don't pay for it reflexively.
Six FAQs
Should I use Sonnet in production APIs?
Yes. Sonnet is reliable, fast, and perfectly fine for real-time code generation, API responses, and chatbots. The cost is low enough that serving Sonnet to 1,000 users costs less than serving Opus to 100. Use Opus in production only when the task genuinely needs it (e.g. a complex diagnosis requiring deep codebase reasoning). Route by task, not by user tier.
Does Sonnet make worse decisions than Opus?
Not on well-scoped tasks. Sonnet's ceiling is context depth and nuance across multiple files. On a single file, single concern, Sonnet reasons just as well. The delta appears when you need "reasoning about reasoning" — meta-analysis, trade-off weighing, cross-system dependencies. For mechanical code generation, they're peers. For architecture, Opus pulls ahead.
Can I use cached context to make Opus cheaper?
Yes. Prompt caching (available on both models) costs 90% less for repeated context reads. If you're iterating on the same file all day, cache it on your first Opus call ($0.30 in tokens), then every subsequent call reads the cache for $0.03. After 10 reads, caching breaks even. For long-running sessions on a single project, caching is mandatory. Set `cache_control: {"type": "ephemeral"}` on system prompts and file context.
What if Sonnet is wrong? Do I waste time debugging?
Sometimes. If Sonnet hallucinates an API endpoint that doesn't exist, you catch it in code review or testing. The time to debug is usually <15 minutes — you notice the error, paste it back, Sonnet fixes it. The total cost is still $0.30 instead of $1.50. The time loss is real but small. Weight it against the cost savings. For mission-critical code, start with Opus; for iteration-friendly work, absorb the 15-minute cost.
Should I always use Opus for code review?
Code review is usually shallow (10–20 files, well-typed TypeScript, clear intent). Sonnet catches 90% of bugs and style issues. Use Opus when you're reviewing a refactor that touches the entire codebase, or when a Sonnet review missed something (feedback loop). Don't assume Opus = better review; assume it = more thorough at higher cost.
How do I know when to escalate to Opus?
Red flags: (1) Sonnet asks for the same context twice, (2) Sonnet says "I need more information to reason about X", (3) Sonnet's suggestion feels locally correct but you suspect it breaks something else, (4) the task involves 20+ files or a 50+ page spec. Green flags: (1) you're writing a single component, (2) the task is "debug this specific error", (3) you know exactly what the code should do. Default to Sonnet, escalate when uncertain.
The Bottom Line
Sonnet 4.6 is the workhorse. Fast, cheap, ships 80% of tasks without friction. Opus 4.7 is the architect — slower, pricier, indispensable when the code design is hard or ambiguous. The production strategy is mechanical: route by task scope, not by paranoia. Use Sonnet until it hits a wall, then escalate. The cost math ($3–15 vs $15–75 per million tokens) and the speed difference (seconds vs minutes) make this decision automatic once you've built the mental model. Start Sonnet. Ship Sonnet. Keep Opus in your back pocket. That's the play.
Building a system that routes between models? Start with Sonnet for the scaffolding, escalate to Opus for the architecture. See how we approach Claude Code vs Cursor for more on production AI tooling, or check Aidxn Design services for guidance on integrating AI into your product.