I Wish AI Would Get Boring

I want to live in a world where AI is as boring as a Postgres database.

Nobody gets worked up about Postgres anymore. The last time I talked about it was telling a colleague I liked the new native UUID v7 support in version 18. Before that, maybe a year or two ago, I had a conversation about geospatial vector stuff compared to SQL Server. That's it. That's the level of excitement a mature, useful technology generates: occasional shop talk among people who use it.

When databases first emerged, they were revolutionary. A way to store massive amounts of data with fast retrieval, robust backup tooling, query languages. That was genuinely exciting because it was new. Now it's just how you store data. Nobody thinks about it. Same with S3. I remember when cloud file storage first launched and it felt like magic. Now it's just where you put files. Kind of boring.

That's exactly where I want AI to be.

What Boring Actually Means

Boring AI means deterministic deployment patterns. It means you know what you're going to get when you deploy it, the same way you know what you're going to get when you set up a REST API or schedule a Cron job.

Right now, AI gets treated like a magic box. You throw a problem at it and hope for the best. That's the opposite of boring. Boring infrastructure has predictable behavior. You configure it, you deploy it, it does the thing. When it fails, it fails in ways you anticipated and can debug.

A REST API either returns the data or throws an error you can trace. A Cron job either runs on schedule or it doesn't. The failure modes are known. You build around them.

Current AI deployments don't work that way. The same prompt can produce different outputs. The failure modes are unpredictable. You can't write a unit test that will reliably catch when it hallucinates. That's not boring. That's chaos dressed up as innovation.

The Current Reality

Instead of boring, we have this: I'm in meetings trying not to be a wet blanket while clients explain how they're going to completely replace complex human judgment with AI. And the thing is, people still get annoyed when you try to inject some reality. They're like, what do you mean AI can't completely replace human judgment? And the answer is: it just can't. It likely never will, at least with current technology.

The serious researchers are coming around to this. Many of the big names are acknowledging that we've taken the LLM approach about as far as it's going to go. More training data, better techniques, bigger models, a trillion parameters. None of that is going to get us dramatically closer to the promises being made. We're approaching an asymptotic plateau. ChatGPT 3 to 4 to 5: the improvements are noticeable but showing diminishing returns. Meanwhile, training costs and energy requirements are increasing by orders of magnitude.

The latest hype cycle around AI personal assistants and agents is a perfect example. Claude's computer use features. OpenAI's offerings. Whatever Musk is doing with Grok. Lots of excitement, lots of promises. And underneath it all, the same fundamental limitations that haven't been solved.

Pilot Purgatory Is a Data Problem

There's a stat floating around that 95% of generative AI pilots fail to deliver measurable ROI. I don't know if that number is precise, but the phenomenon is real. Companies are stuck in what people call "pilot purgatory."

I'm watching it happen right now. One client is coming up on a year of proof of concepts with no working applications. No real connection to their actual data. The concepts are fine. The problem is they never spent time on the actual work: cleaning their data to the point where AI can use it effectively.

The ROI isn't in the model. It's in the data engineering.

This is the dirty secret of "AI transformation." Boring AI is actually just excellent data hygiene with a language interface on top. The companies that will succeed aren't the ones with the most sophisticated models. They're the ones who did the unglamorous work of structuring their data, establishing quality pipelines, and building validation layers.

Garbage in, garbage out. That's always been true. It's still true.

Because most clients haven't done the data work, my job in meetings isn't just engineering. It's psychological de-escalation.

I don't tell them their ideas won't work. That's a dead end. As soon as you say that, their brains shut off and they don't want to talk to you anymore. Instead, I ask questions. Walk me through the process. Tell me about the current points of human judgment. What kind of training do the people doing this work have? What kind of experience?

You get them talking. You hope that as they're explaining the complexity of their own processes, they make the connection themselves. Then you can prime the conversation: the AI struggles with nuanced things, hallucinations are a real problem, context engineering matters. But you have to be tactful. You can't just announce that their vision is unrealistic.

The Path of Resistance

I recently did a Python-to-C port of some sophisticated code. Heavy math: Fourier transforms, wave propagation, finite differences. Serious engineering calculations, not business logic.

The complexity was beyond what Claude Code could manage on its own. LLM utility is inversely proportional to the entropy of the problem space. When the logic becomes non-standard, when there's no common pattern in the training data to fall back on, the AI defaults to hallucinated averages. It gives you something that looks right but isn't.

At a certain level of complexity, these tools start deflecting off the goal. They want to get you to X-prime, which is close to X but not X. Because getting to X is hard, and when they can't figure out the way, they try to change where you're going.

There are two ways to reach a hard destination. One is to put in the work, drive forward, keep course correcting, examine what's going wrong. The other is to go somewhere easier. The tool always wants to do the second thing when it hits real complexity. It starts in the right direction, then peels off when things get hard. You have to drag it back into the path of resistance.

The only way to keep it on track is better validation harnesses. Unit tests. Property-based testing. Ways for the AI to verify its own output against known constraints. Without those guardrails, you're just hoping the deflection doesn't happen.

The port took me about a week. Another team had done a similar port into a different language for a different use case. It took them a month, and they had assistance too. Same tools. The difference was knowing when to build those validation layers and how to recognize when the tool was drifting toward X-prime.

Even at five times the current token cost, that port would have been worth it. Like a backhoe operator: a skilled operator makes their boss money. Someone who can't use the machine effectively doesn't. That's just how the world works.

The Power of Specialized, Narrow-Scope Tools

The magic box approach fails. The opposite approach works.

I have a tool that helps me write these blog posts. It uses Whisper to record my voice so I can dictate through an interview process where I gather my thoughts. Then a well-crafted prompt, trained on my voice and informed by real research into blog writing and interview techniques, translates that into a draft.

It does one thing. It does it well. It strips out my verbal tics and tangents, structures things coherently, and produces something that sounds like me. That's it. No AGI. No replacing human judgment. Just a narrow tool with a specific purpose and predictable behavior.

That's what boring AI looks like. The tool does its thing. It adds value. Nobody's impressed by it except the person using it.

I'm working with a friend on an app that follows the same philosophy: gathering data, synthesizing it, making recommendations. Completely within the realm of possibility. A narrow scope. A defined problem. What's holding it back? Nothing dramatic. Requirements gathering, stakeholder buy-in, data privacy concerns to navigate. The only barrier is standard project execution.

The specialized tool is the antidote to the magic box. It's also the only version of AI that actually ships.

The Decades View

There's a paper called "AI as Normal Technology" arguing that AI will transform society like electricity did, but over decades, not months.

We're in what you might call the Irruption Phase. Chaos, hype, wild promises, money flooding in faster than value can be created. Every transformative technology goes through this. The useful part comes later, in the Synergy Phase, when the technology becomes boring infrastructure that people build real things on top of.

The Irruption Phase is valuation-led. The Synergy Phase is utility-led. We're deep in the first one. The second one is where the actual value gets created.

Electricity went through the same arc. So did the internet. The loud phase gets all the attention. The quiet phase creates all the value.

Why Boring Probably Won't Happen Soon

The economic incentives are directly opposed to AI becoming boring.

The whole AI ecosystem has some characteristics of a bubble. The chip manufacturers are funding the startups. The money is somewhat incestuous. As soon as the hype stops inflating the bubble, or even just slows down, the cascading effects begin. Individual investors lose their enchantment, try to pull money out, and the whole thing starts unwinding.

If the story isn't "AGI is around the corner" and "software engineers will be obsolete by the end of the year" (third year running), if Grok 6 isn't going to be Skynet, then what justifies the valuations? Approaching-trillion-dollar net worths depend on the hype continuing.

Without debt-fueled expansion, token costs would probably go up. For legitimate uses, that's probably fine. But a lot of what's happening right now only makes sense if you believe the transformative promises.

The Competitive Advantage of Boredom

Here's the thing about bubbles: they pop. And when they do, the people left standing are the ones who were building something real.

The boring makers will survive. The ones focused on unit economics instead of valuation multiples. The ones who spent their time on data engineering instead of pitch decks. The ones who built validation harnesses and understood the actual capabilities and limitations of the tools.

When the hype money dries up and token costs reflect real economics, the magic box deployments will collapse. The deterministic, well-tested, properly-engineered implementations will keep running.

I'm both a user and a maker. I built the tool I use for writing. I use coding assistants daily. I'll probably keep thinking about AI even if it becomes boring, because I'm building tools for others to use. Hammer manufacturers talk about hammers. That's the job.

But I'm building boring hammers. Tools with predictable behavior and known failure modes. Tools that do one thing well. Tools that will still make sense when the Irruption Phase ends and the real work begins.

That's the bet. Boring wins.