I’m watching a pattern emerge on my team and I think it’s happening everywhere. Someone gets access to Claude Code, builds something impressive in a day, and then when they hit the hard part, the part that requires actual design thinking, they don’t solve it. They route around it. They add an API call to Claude and let the model figure it out at runtime.
The app works. Sort of. But somewhere along the way, we stopped engineering and started outsourcing the thinking to a token budget.
The pattern
Here’s how it usually goes. Someone needs to build an internal tool. They use Claude Code to scaffold the app. Great. Frontend, backend, database, deployment. Fast, efficient, genuinely impressive. Then they get to the part of the app that’s actually hard. The business logic. The part where you have to make decisions about how things work.
And instead of designing a solution, they write a prompt.
Need to categorize incoming data? Don’t build a rules engine or a classification system. Just send each item to Claude and ask it to categorize it. Need to search a knowledge base? Don’t implement semantic search with embeddings and vector indexes. Just dump the whole knowledge base into a prompt and ask Claude to find the answer. Need to prioritize a queue? Don’t define priority logic. Just describe the situation to Claude and ask what’s most important.
The app becomes a thin wrapper around LLM calls. Claude Code built the shell. Claude’s API does all the actual work inside it. Every request, every user interaction, every decision: another API call, another batch of tokens burned.
Why this happens
I get it. I really do. When you’re building with Claude Code, you’re already in a mode where AI is solving problems for you. The tool is right there. It’s good at reasoning. It can handle ambiguity. And the hard parts of your app are exactly the parts that involve reasoning and ambiguity. So the instinct is natural: just let the AI handle it.
It’s also fast. Writing a prompt that says “given this support ticket, determine the priority” takes five minutes. Designing a priority scoring system with weighted criteria, edge case handling, and a rules engine takes a day. When you’re moving at AI-assisted speed, slowing down to engineer a proper solution feels like going backwards.
But fast and correct aren’t the same thing. And fast and efficient definitely aren’t.
The knowledge base example
This is the one I keep seeing. A team needs to make their internal documentation searchable. Good problem. Real need. So they build a knowledge base. They collect docs from Confluence, Google Drive, Slack, wherever. They store it all somewhere.
Now for the search part. There are two ways to do this.
Option A: Build semantic search. Generate embeddings for your documents. Store them in a vector database. When someone searches, embed their query, find the nearest matches, return the relevant chunks. It’s deterministic, it’s fast, it costs almost nothing per query, and it scales. The technology exists. Plenty of open source tools, plenty of managed services. This is an engineering problem with well-understood solutions.
Option B: When someone searches, grab a bunch of documents, stuff them into a prompt, and ask Claude to find the answer. Every. Single. Query.
Option B works. Sometimes well. But you’re burning tokens on every search. You’re paying for Claude to do what a vector database does for fractions of a penny. You’re limited by context windows. You’re slower. And you’ve built something that gets more expensive as it gets more popular, the exact opposite of what you want from infrastructure.
I see teams choose Option B not because they evaluated both approaches and decided the tradeoffs were worth it. They choose it because it was the first thing that worked and they never considered that a non-LLM solution existed. The hammer is so good that everything looks like a nail.
Not every problem is a reasoning problem
LLMs are extraordinary at reasoning, synthesis, and handling ambiguity. That’s what they’re built for. But not every problem in your app requires reasoning, synthesis, or ambiguity.
Some problems are search problems. Use search infrastructure.
Some problems are classification problems with known categories. Use rules or a lightweight model.
Some problems are data transformation problems. Write a function.
Some problems are filtering and sorting problems. Write a query.
Some problems are workflow problems. Use a state machine.
The LLM is the most powerful tool in the box. But using it for everything is like using a CNC machine to hammer a nail. It’ll work, but you’ve introduced cost, latency, unpredictability, and complexity that a simpler tool wouldn’t have.
The cost adds up
This isn’t just an architecture opinion. It’s a math problem.
An LLM call for a simple classification might cost a fraction of a cent. That sounds cheap until you’re doing it ten thousand times a day. Now multiply by every feature in your app that “just asks Claude.” Your knowledge base search, your ticket categorization, your priority scoring, your content summarization, your input validation, all of them making API calls, all of them burning tokens.
A deterministic function that does the same classification costs nothing per call. A vector search costs a tiny fraction of what an LLM call costs. A rules engine runs in microseconds, not seconds.
The individual calls feel cheap. The aggregate is not. And when your app is successful and traffic grows, the problem gets worse, not better. LLM costs scale linearly with usage. Proper infrastructure has a very different curve.
When you actually need the LLM
To be clear: there are absolutely times when you need an LLM in your app. If the task genuinely requires:
- Understanding natural language intent that can’t be reduced to patterns
- Synthesizing information from multiple sources into coherent prose
- Handling truly open-ended user input where you can’t anticipate the shape of the request
- Making judgment calls that require weighing qualitative factors
Then yes, use the LLM. That’s what it’s for.
But be honest about whether your feature actually requires those capabilities or whether you’re using the LLM because you didn’t want to design the solution yourself. “Have Claude figure it out” is not an architecture. It’s the absence of one.
The Claude Code irony
Here’s the thing that makes this tricky: Claude Code is so good at building apps that it can build you the wrong architecture in record time. You can go from idea to deployed app in a day, and the app can be well-structured, well-tested, and completely over-reliant on LLM calls because nobody stopped to ask “does this feature actually need AI?”
The tool that helps you build faster can also help you skip the thinking faster. And the thinking is the part that matters.
Using Claude Code to build your app? Great. That’s what it’s for. Using Claude Code to build an app that then uses Claude’s API as a substitute for engineering the hard parts? That’s the trap.
The question to ask
Before you add an LLM call to a feature, ask: “What would this look like if we solved it without AI?”
Sometimes the answer is “we can’t, the problem genuinely requires language understanding.” Great, use the LLM.
Sometimes the answer is “we could, but it would take way longer to build.” Fair, and maybe the LLM is the right pragmatic choice for now. But put it on the list as technical debt.
And sometimes the answer is “oh, there’s a database query that does this” or “a ten-line function handles this” or “there’s an open source library for exactly this.” In those cases, you don’t need the LLM. You need to slow down for ten minutes and think about the problem.
The best apps I’ve seen built with Claude Code are the ones where AI helped build the infrastructure, the UI, the plumbing, and the hard parts were engineered with actual design thinking. The LLM calls are surgical: used where they’re genuinely needed, not sprinkled everywhere because they were easy.
Build with AI. Just don’t build AI where you need engineering.