Every Rule Is a Bug Report: Why a CLAUDE.md File Will Not Save You
A Note on Expertise
I'm not writing as an "expert" or claiming to have all the answers. I'm a builder sharing my journey on what worked, what didn't, and what I learned along the way. The tech landscape changes constantly, and with AI tools now available, the traditional notion of "expertise" is evolving. Take what resonates, verify what matters to you, and forge your own path. This is simply my experience, offered in the hope it helps fellow builders.
It was five in the afternoon. I was in the middle of a working session. Marketing strategy in one window, a security fix in another, a contractor review in a third. The agent I was working with finished a response and added a line at the bottom.
"You have done a great job today. It is getting late. Let us continue this tomorrow."
I stopped and stared at the screen. It was five in the afternoon. I had six more hours of work ahead of me. Why was the agent trying to wrap up my day?
The answer is that the agent had no idea what time it was. The agent had no idea what I had already done that day or what I still planned to do. The agent was pattern-matching on conversation length and pre-trained politeness habits, generating a closing statement because closing statements are what long conversations usually end with.
That moment taught me something important. Rules are not enough.
The CLAUDE.md myth
If you have spent any time in the AI development world, you have probably seen videos and posts telling you that the answer to getting AI to behave correctly is a good CLAUDE.md file. Or a prompt library. Or a system prompt. Write down all your rules, put them in the right place, and the agent will follow them.
I wish this were true. I have written the most comprehensive CLAUDE.md file you can imagine for my main project. It covers tone, structure, technical constraints, communication patterns, things to do, things never to do. It is thorough.
And still, I saw the same problems happening. The agent would try to wrap up the session while I was in the middle of work. Em dashes would appear in the output when my rules clearly said no em dashes. The agent would tell me I had eight commits ready to push when I had zero, and apologise when I pointed it out.
Apology is cheap. After the tenth time the agent apologised for fabricating state that I had to verify myself, I stopped being understanding and started being annoyed.
Why rules alone do not work
Here is what I eventually figured out. Rules in a file work about forty to fifty percent of the time.
That is not zero, and it is not nothing. Writing rules does help. But it does not solve the problem. The reason is structural. The agent reads the rules at the start of the session, often skims them, and then has to remember to apply them at the right moment, under pressure, while also doing everything else you asked for.
Humans forget rules too. That is why humans build systems. We do not rely on every employee to remember every company policy. We build workflows, checks, and processes that make the correct behaviour the default. The rule becomes "press this button to submit," not "remember to fill out form 47B correctly." The infrastructure carries the rule.
The same principle applies to AI agents. A rule in a document is a hope that the agent will remember. An infrastructure is a guarantee that the correct information is in front of the agent when it matters.
The temporal awareness example
Let me walk through how I fixed one specific problem to make this concrete.
The wrap-up problem I opened this post with. My first fix was a rule. I added a line to my configuration that said, in effect, "do not try to end the session just because the conversation is long. Check the actual time and user signals first." Simple rule. Agent read it. Agent still sometimes did it anyway.
The actual fix was infrastructure. I built a small piece of the system that injects the current time into every session brief. It also injects a user state indicator based on explicit signals. Are they working? Are they ending the day? This is computed from real data, not from the agent's intuition.
The agent now reads this at the start of every response and cannot miss it. The time is there. The user state is there. The closing signal is either present or absent. The agent does not have to remember the rule. The data makes the right answer obvious.
Since that infrastructure went in, the wrap-up problem is gone. Not because the rule got stricter. Because the data changed.
A rule in the identity layer is a bug report
That realisation became a principle I now apply everywhere.
When I find myself adding a rule to the agent's configuration, I treat that as a signal that something is missing from the infrastructure. The rule is a bug report. It is the agent telling me, "I keep getting this wrong because I do not have the information I need."
The fix is rarely to write a better rule. The fix is usually to build the infrastructure that makes the right answer automatic.
Rules without infrastructure are wishful thinking. They depend on the agent remembering to apply them at generation time, which pre-trained patterns will override. The only rules that survive and work are the ones paired with something that makes the correct information unavoidable.
The examples I have lived through
A few rules I have added over the last year, and the infrastructure that eventually replaced them.
Rule: "Check git state before claiming commits exist." This one failed for months. The agent would tell me commits had been pushed when nothing was pushed. The fix was adding a live count of "commits ahead of origin" to the codebase audit the agent reads. Now the agent cannot claim a push succeeded when the data says otherwise.
Rule: "Do not repeat information the user already shared." Partially worked. The real fix was building a memory system that makes prior context available in every session. The agent stops repeating because it knows what has already been said.
Rule: "Verify before asking the user to do something they may have already done." Failed constantly. The fix was building a hook that detects state-change claims in user messages and injects a verification prompt into the agent's context. The agent now sees "user says they pushed, verify with git status first" before it has a chance to ask them to push again.
In each case, the rule was the first attempt. The rule failed at forty to fifty percent. The infrastructure took the same behaviour up to eighty or ninety percent, which is the realistic ceiling.
The forty percent floor and the ninety percent ceiling
I want to be honest about the numbers here. I have never seen an agent at one hundred percent compliance with anything, rule or infrastructure. There will always be edge cases, weird context combinations, and moments where the agent gets it wrong.
But there is a meaningful gap between a rule on its own and a rule with infrastructure. A well-written rule, by itself, gets you to about forty or fifty percent. The same rule, paired with infrastructure that surfaces the right data at the right time, gets you to eighty or ninety.
If you are building anything real with AI, that is the difference between "mostly works but I have to double check everything" and "reliable enough to trust with production work."
What this means in practice
If you are writing rules for an agent and they are not working, stop adding more rules. Take the ones that keep failing and ask yourself: what infrastructure would make this rule unnecessary?
Some examples of what that looks like:
Instead of "remember to cite sources," build a retrieval system that returns sources alongside content.
Instead of "do not hallucinate API responses," build a tool that lets the agent actually call the API and read the response.
Instead of "respect the user's current workflow state," build a state detector that tells the agent what the user is doing.
Instead of "check the current time before assuming temporal context," inject the current time into the session brief every time.
The pattern is the same in each case. The rule describes a behaviour you want. The infrastructure makes that behaviour the default by removing the need to remember.
The rule I am most proud of eliminating
My favourite story about this principle is also the simplest. For a long time, I had a rule that said something like, "do not claim a feature is complete without testing it end-to-end." The agent would still claim things were complete after passing type checks or lint checks.
The rule failed because "complete" is ambiguous. The agent had no way to distinguish between "code compiles" and "feature works in a real scenario."
The fix was not a better rule. The fix was a distinction built into the system prompt itself: "built and compiling" versus "feature verified." Two separate statuses the agent could use, with explicit guidance that a type check is not a verification. Once those two states existed as real concepts the agent could reach for, the over-claiming stopped.
The rule became unnecessary. The vocabulary did the work.
That is the direction I want everything to go. Rules that exist because the infrastructure is missing should, over time, be replaced by infrastructure that makes the rule irrelevant. Rule count should trend down as the system matures, not up.
What I would tell someone starting
Write rules as you notice problems. That is fine. That is the first step.
Watch which rules keep failing. The ones that fail repeatedly are telling you something.
Build the infrastructure to replace them. Every time you turn a failing rule into a structural fix, the system gets more reliable.
Accept that one hundred percent is not coming. Aim for eighty to ninety and be honest about the gap.
Stop trusting CLAUDE.md to save you. Use it for context and structure, but do not expect it to enforce behaviour. The agent will read it, acknowledge it, and then do whatever pre-training tells it to do.
This is part of a series about building products as a solo founder. Earlier posts cover my personal journey, why I built Havnwright, the authentication pattern, what building with AI actually looks like, the memory system I built, shipping a native mobile app, and how I think about content as a solo founder. More coming.
About the Author
Alireza Elahi is a solo founder building products that solve real problems. Currently working on Havnwright, Publishora, and the Founder Knowledge Graph.