The LLM Reality Check: Why AI Can't Replace Strategic Software Development

First published at Tuesday, 22 July 2025

The LLM Reality Check: Why AI Can't Replace Strategic Software Development

Building an MVP with LLMs over the past few weeks has revealed two fundamental limitations that define exactly how AI fits into software development workflows: non-trivial tasks fail about 40% of the time regardless of prompting quality, and code quality correlates directly with how often other developers have solved the same problem online.

These aren't tooling problems or prompting problems. They're fundamental constraints that determine the economic boundary of LLM utility in software development.

The 40% Barrier: Where Consistency Breaks Down

Simple tasks work brilliantly. Complex tasks fail predictably. But the most frustrating category is medium-complexity tasks that should work but don't.

Tell an LLM to use TypeScript domain models with Zustand stores? It builds a pure JavaScript API instead. Specify SCSS with BEM methodology? It adds CSS modules for fun. Request a specific architectural pattern? It rebuilds Redux from scratch because that's what it "knows."

The pattern is consistent: about 60% success rate on anything beyond basic patterns, dropping to 40% or worse for domain-specific logic. Unlike junior developers who learn from mistakes and improve over time, LLMs regenerate the same categories of mistakes indefinitely.

The Internet Popularity Contest

This failure rate isn't random—it follows predictable patterns based on training data. Want to build a todo app? LLMs will give you production-ready code with 98% success rate. The internet is littered with todo tutorials, GitHub repositories, and blog posts.

Need to implement domain-specific business logic for your unique market niche? Prepare for chaos.

The more novel your problem, the worse LLMs perform:

Common problems: Lightning-fast, high-quality solutions
Slightly modified common problems: Decent results with some cleanup needed
Domain-specific challenges: Mediocre code that requires significant refactoring
Genuinely novel approaches: Complete rewrites necessary

The Enterprise Reality

Most enterprise software development involves problems that don't exist in Stack Overflow answers:

Legacy system integration with undocumented APIs
Business rules that exist nowhere else in the world
Compliance requirements specific to your industry
Custom workflows that evolved over years of operational experience

None of this exists in public repositories. LLMs trained on public internet data can't help with the problems that actually matter to most businesses.

The Supervision Tax

These limitations create a specific cost structure for LLM-assisted development:

Simple tasks: 10x productivity gain (basically free)
Medium tasks: 1-2x productivity gain (accounting for review and fixes)
Complex tasks: Break-even or productivity loss (more time fixing than building from scratch)

The supervision overhead scales poorly. You can't solve this by adding more LLMs—unlike human teams, LLMs don't learn from each other's mistakes, develop shared understanding of codebases, or build institutional knowledge about business requirements.

The Strategic Insight: Your Moat Is What LLMs Can't Build

Here's the key realization: if an LLM can build your core feature perfectly, that feature probably isn't a sustainable competitive advantage. Your differentiation lies in the problems only you face, the solutions only you've developed, the domain knowledge only you possess.

LLMs "democratize" commodity software development. They can't democratize domain expertise, novel solutions, or unique market insight.

Working Within the Constraints

The 40% barrier and internet bias suggest specific strategies for LLM-assisted development:

Use LLMs for undifferentiated heavy lifting: Authentication, database schemas, API clients, common patterns where human time is expensive and LLM errors are cheap to fix.

Handle core business logic yourself: The logic that makes your product unique, novel approaches that haven't been documented publicly, and architecture decisions that require domain expertise.

Design for LLM limitations: Break complex tasks into simple, verifiable pieces. Invest saved time in high-value activities like user research and strategic technical choices.

The Assistant Model, Not Replacement

The 40% error rate on complex tasks means LLMs work best as coding assistants rather than autonomous developers. They excel at generating starting points for complex implementations and handling routine modifications to existing code. They struggle with making architectural decisions, understanding implicit business requirements, and maintaining consistency across large codebases.

Improvements in LLM capability haven't significantly moved the 40% barrier. Better models generate more sophisticated code, but they fail on complex tasks at roughly the same rate. This suggests the limitation isn't just training data or model size—it's something fundamental about the nature of complex software development that requires human intuition and strategic thinking.

The Economic Boundary

The 40% barrier and internet popularity bias together define the economic boundary of LLM utility in software development. Below this complexity threshold, LLMs provide massive productivity gains. Above it, human expertise remains essential.

Understanding this boundary helps you invest LLM assistance where it provides the highest return while ensuring human developers focus on the problems that actually require human insight.

The future of development isn't humans vs LLMs. It's humans sometimes assisted by LLMs, each working on the problems they're best equipped to solved with humans firmly in control of the strategic, novel, and business-critical decisions that create lasting competitive advantage.

Subscribe to updates

There are multiple ways to stay updated with new posts on my blog:

A classic RSS feed (for example in Portalific)
I'll toot about it on mastodon
All updates will go to LinkedIn, as well