Photo: Unsplash
With large language models, the failures hide in plain sight: hallucinated financials in board presentations, misaligned customer responses bleeding conversion rates, or proprietary data leaking through poorly configured prompts. Yet enterprises are racing to deploy these systems anyway, betting billions that fine-tuning will transform generic AI into competitive advantage. They're mostly losing that bet.
The $10,000 Question Nobody's Asking
Here's what OpenAI and Anthropic won't tell you: Their models are bleeding cash. Anthropic expects to lose $3 billion in 2025 despite hitting $5 billion in annualized revenue. OpenAI burns through even more. The math is brutal—these companies lose money on every API call, subsidizing enterprise experiments with venture capital.
The dirty secret? Most enterprises using off-the-shelf models are doing the same thing—hemorrhaging value while pretending AI is transforming their business.
LLMs respond linearly. They pattern-match based on training data, with no inherent understanding of context. Ask GPT-4 or Claude about geopolitical risk assessment, and you'll get Wikipedia-level analysis unless you build the scaffolding yourself.
Consider a basic enterprise query: "How large is the threat from Russia compared to China?"
Without context, you get garbage. With context, you need to specify:
- Economic parameters: Russia's $2.1 trillion GDP versus China's $19.7 trillion
- Military dimensions: China's 2.2 million active personnel versus Russia's 1.3 million
- Alliance structures: NATO's 32 members and $1.3 trillion defense budget
- Resource leverage: Russia's energy dependency versus China's 60% control of rare earth production
This isn't AI intelligence—it's manual knowledge engineering. The same tedious work enterprises thought they were escaping.
The Fine-Tuning Gold Rush
Fine-tuning promises salvation. Feed your model proprietary data, the pitch goes, and watch accuracy improve 20-50% on domain-specific tasks.
The economics are seductive. A fine-tuned model for fraud detection can save millions. Healthcare diagnostics can reduce error rates. Customer service can actually solve problems instead of frustrating users.
But here's what the vendors won't emphasize: Fine-tuning costs range from $10,000 for proof-of-concepts to millions for production systems. You need clean data—lots of it. You need MLOps infrastructure. You need talent that commands $500,000+ salaries.
Translation: Before you can train the model, you need to rebuild your company around it.
The Anysphere Exception
There's one bright spot in this mess: Anysphere's Cursor, which hit $500 million in annual recurring revenue by actually solving a problem developers have. It doesn't promise to revolutionize coding—it just makes writing code faster with smart completions and edits.
The lesson? Narrow, practical AI applications work.
Grand enterprise transformation doesn't. Microsoft seems to understand this. They're hedging their OpenAI bet by integrating Anthropic's Claude into Office 365. Not because they're abandoning OpenAI—the companies remain deeply intertwined—but because different models excel at different tasks.
This multi-model future is where enterprises should focus. Not on training one perfect model, but on orchestrating many specialized ones.
The Reality Check
Census data shows AI adoption among U.S. firms doubled to 9.7% by August 2025. In the Information sector, it's 25%. In food services, 2.5%. The pattern is clear: AI works where digital infrastructure already exists, where workflows are already structured, where data is already clean.
For everyone else, the puppy training continues. Except unlike puppies, these models never quite learn to stop making messes—they just get better at hiding them in plausible-sounding responses.
The winners won't be companies that fine-tune best. They'll be companies that recognize fine-tuning's limits and build around them. That means:
- Accepting narrow AI wins over transformation fantasies
- Investing in data infrastructure before model training
- Using multiple models for different tasks
- Measuring actual ROI, not ARR projections
The alternative? Spending millions to discover what puppy owners learn for free: Some messes are inevitable.
The question is how long will you have to wait before the puppy is trained properly.