You Don’t Have a Prompt Problem—You Have a Memory Problem
Marcia Coulter
4/29/20262 min read
Most teams working with AI assume they have a prompt problem.
If the output is off:
tweak the prompt
add constraints
add examples
Repeat until it “works.”
And sometimes it does.
But then you start seeing things like:
The same prompt behaves differently across sessions
A fix works, then quietly stops working
Outputs drift as you expand the system
Teammates can’t reproduce what you’re seeing
At that point, it’s worth asking:
Is this really a prompt problem?
Where It Starts to Break
Take something simple: a support ticket classifier.
You define categories. You write a prompt. You refine it as edge cases come in.
It feels like normal engineering work.
Until you hit conflicts.
A ticket mentions two categories.
You add a rule to prioritize one.
That works—until a slightly different phrasing shows up.
Then it flips again.
So you add another rule.
Now you have rules layered on top of rules…
but no way to tell:
which rule actually applied
whether the model is following them consistently
or why the same input produces different results later
The Debugging Problem
At some point, you try to debug it properly.
You log inputs and outputs.
Maybe you store a few examples.
But that only gets you so far.
Because what you don’t have is:
the assumptions the model used
the constraints it actually applied
the path it took to reach a decision
So when something breaks, you’re stuck with:
“It worked before. Now it doesn’t.”
And no clear reason why.
Prompts Don’t Give You State
In traditional systems, you don’t rely on instructions alone.
You rely on state:
decisions persist
behavior is traceable
results are reproducible
With most AI systems, none of that is guaranteed.
You have:
prompts
responses
But not the reasoning in between.
And that’s the problem.
What This Leads To
Without persistent reasoning:
Fixes don’t reliably stick
Edge cases reappear
Behavior changes across sessions
Debugging becomes guesswork
You end up re-solving the same problems over and over.
The Shift
At some point, the question changes.
Instead of:
“What’s the right prompt?”
It becomes:
“What would it take for this system to remember how it got here?”
Not the output.
The reasoning.
What’s Missing
If you wanted to stabilize this, you’d need to capture things like:
the rules you introduced
the assumptions behind them
how conflicts were resolved
what changed over time
And you’d need to be able to:
revisit that
inspect it
reuse it
extend it
Across sessions.
Across people.
Where This Points
That’s not a prompt tweak.
That’s a missing layer.
A layer where reasoning doesn’t disappear after each interaction—
where it can be carried forward, inspected, and reused.
In other words: