Things I Did Wrong Building AI Agents (So You Don't Have To)

Christopher George

This is Part 2 of a four-part series on building AI agents for a small business. Part 1 covered the 700 hours and what I learned at the highest level. This piece goes under the hood to the specific mistakes that cost me the most.


Every honest builder has a list like this. Mine is long, and it is the most useful thing I can hand another operator right now. If you have read enough AI success stories to feel suspicious, this post is for you.

Over four months and roughly 700 hours, I deployed multi-agent systems three separate times. Each iteration improved. None of the first two survived. Below are the five mistakes I kept repeating until I learned to recognize them in advance.


1. I Built a Platform Before I Solved a Problem

My first agent framework was OpenClaw. I ran more than two hundred agents on it simultaneously, each with a distinct role: research, drafting, analysis, ops, customer support, content planning, and more. Claude Code once observed that what I had built was less a company than a small government. It was being honest.

The mistake was obvious in hindsight. I had not deployed two hundred agents to solve two hundred distinct business problems. I had deployed them because I could, and because the concept of a complete autonomous operation was compelling enough to outrun my planning. No individual agent had a clearly scoped mandate, a measurable outcome, or a clear escalation path. They were capable in isolation and chaotic in aggregate.

When something went wrong — and something always went wrong — I could not reason about which agent had caused it, because too many were acting on overlapping context at the same time. Debugging a two-hundred-agent system is not like debugging a program. It is closer to debugging a company whose org chart you wrote over a weekend.

The lesson: Start with one problem worth solving and one agent worth trusting to solve it. Earn the right to scale.


2. I Confused Capability with Readiness

OpenClaw was a promising prototype framework. It was not a production platform. I knew this. I deployed at production scale anyway, because the framework could technically handle what I asked of it for short stretches, and I interpreted "technically possible" as "reliably possible."

It was not the same thing. Systems that work for three hours in a controlled demo often fail at seventy-two hours under real load. Frameworks that can spawn a hundred agents in benchmarks often cannot keep them coordinated under hours of conflicting context. By the time I understood the distinction, I had already built an operation on top of a foundation that could not hold it.

This is a trap specific to moving fast in a young industry. Most AI tooling you will use today is a prototype that looks like a product. If you treat a prototype as a product, the prototype does not become a product — your business absorbs the gap.

The lesson: When a tool is young, assume the ceiling is lower than the marketing suggests. Build in a margin, and be ready to replace the foundation when it stops serving you. When the second OpenClaw deployment was winding down, I scrapped my planned third deployment and moved to Hermes Agents by NousResearch — a more secure and robust framework. That single change fixed most of what I had been fighting.


3. I Planned Without Accounting for Tokens

For most of the first two deployments, I was not thinking clearly about cost.

To put this in numbers: a typical heavy user runs 2 to 5 million tokens per month. I was running approximately 780 million per month and climbing toward one billion before I forced myself to stop. I was not reckless in any single decision. I was reckless in a thousand small decisions that each looked reasonable in isolation.

Every time I used an LLM to do something a deterministic script could have done, I was paying a premium for reasoning I did not need. Every time I sent a long conversation through an agent when a short function call would have served, I was paying for context that did not improve the outcome. Every time I let an agent handle a recurring data task because it was faster to set up than proper automation, I was renting a solution I should have owned.

The shift that finally corrected this was simple to articulate and harder to practice. Before asking an AI to do something, I now ask: is this actually reasoning, or is it pattern-matching that a script would handle for a fraction of the cost? Most of the time, it is the latter.

The lesson: Treat LLM calls like the most expensive labor in your business, because they are. Use scripts and deterministic automation for the work that does not need judgment. Reserve the model for the work that does.


4. I Tried to Instruct Without Understanding

AI coding is a double-edged sword. The upside is that you can build things without learning to program. The downside is that you can also fail to build things without learning to program, and you may not be able to tell the difference.

When my early deployments crashed repeatedly on the same categories of problems, I assumed the tools were deficient. Some of the time they were. Much of the time, the real issue was that I did not know enough about how software works to describe what I needed with the precision the model required. If you cannot articulate the shape of a problem, the most capable AI in the world cannot solve it for you.

This is not a case for a four-year CS degree. It is a case for knowing just enough to ask the right question. What a function is. What a database does. Why state matters. How a file system is organized. These are not advanced concepts — they are the vocabulary required to brief an engineering partner, which is what working with AI actually is.

I learned by doing, by asking the AI to explain what it was writing, and by pausing when I felt confused instead of pushing through. The hours I spent on that kind of slow, deliberate learning paid back more than the hours I spent generating code.

The lesson: Invest enough time in fundamentals to describe the problem accurately. AI compounds on clarity.


5. I Paid Annually in a Monthly Industry

One of the more mundane mistakes I made was paying for a full year of Cursor early on, right after loving the first free month. It felt like a deal. It was not.

The AI tooling market in 2025 and 2026 does not reward annual commitments. Products I was enthusiastic about six months ago have been replaced by better options, pivoted into different product categories, or priced themselves out of small-business fit. Annual contracts at this stage of the market are bets against your own learning curve.

This extends beyond software. Any infrastructure decision you make right now should assume that something better will exist in six months. Build in the optionality to switch. Keep your data and your configuration in formats you can move. Do not tie yourself to a single provider with pricing structures that punish migration. The next article in this series covers this in more detail; the short version is that the tool is almost never the durable asset.

The lesson: Pay monthly. Stay portable. Treat the tool as rented.


Why These Mistakes Were Still Worth Making

I do not regret the failed deployments, and I would be careful about telling another operator to avoid mine completely.

Each of the three OpenClaw deployments taught me something I could not have learned any other way. The first showed me that ambition without scope produces chaos. The second showed me where the real ceiling of a prototype framework was. The third deployment — which I scrapped before launching — was the first time I had enough context to say this is the wrong foundation before spending another three months discovering it.

The mistake is not failing. The mistake is failing without extracting the insight. The operators I see struggling with AI right now tend to fall into one of two camps: the ones who refuse to try anything ambitious, and the ones who try everything at once and never stop long enough to ask what the last attempt actually revealed. Neither camp is building durable systems.

The useful posture is somewhere between the two. Move fast enough to encounter real limits. Slow down enough to understand them. Rebuild with a better foundation before you rebuild at larger scale.


A Short Inventory

If you want a checklist version of the above to use on your next deployment:

  • Have I identified the specific business problem this agent is solving? Can I name it in one sentence?
  • Am I treating this framework as production when its documentation describes it as a prototype?
  • What does a well-scoped LLM call look like here, and what would a deterministic script replace?
  • Can I articulate the problem I am asking the AI to solve in plain language?
  • If I had to migrate off this tool in six months, what would survive the move?

If any answer is unclear, the build is not ready. That is not a reason to stop — it is a reason to plan.


Coming Up

The next piece in this series flips this around. Given everything I did wrong, what would I actually do differently if I were starting AI for a small business from scratch today? That is Part 3: A Smarter Path.


Series navigation:

  • Part 1: What 700 Hours of Building AI Agents Taught Me About Small Business
  • Part 2: Things I Did Wrong Building AI Agents (So You Don’t Have To) (this piece)
  • Part 3: A Smarter Path — What I’d Do Differently Starting AI from Scratch
  • Part 4: Your AI’s Memory Will Outlive Your Tools

Christopher George is the founder of Marching Dogs and House of Carts. He runs a fleet of AI agents across seven machines for his businesses and has spent more than 700 hours building, breaking, and rebuilding AI systems without a computer science degree.