You Don’t Have a Prompt Problem—You Have a Memory Problem

Marcia Coulter

4/29/20262 min read

Most teams working with AI assume they have a prompt problem.

If the output is off:

tweak the prompt
add constraints
add examples

Repeat until it “works.”

And sometimes it does.

But then you start seeing things like:

The same prompt behaves differently across sessions
A fix works, then quietly stops working
Outputs drift as you expand the system
Teammates can’t reproduce what you’re seeing

At that point, it’s worth asking:

Is this really a prompt problem?

Where It Starts to Break

Take something simple: a support ticket classifier.

You define categories. You write a prompt. You refine it as edge cases come in.

It feels like normal engineering work.

Until you hit conflicts.

A ticket mentions two categories.
You add a rule to prioritize one.

That works—until a slightly different phrasing shows up.
Then it flips again.

So you add another rule.

Now you have rules layered on top of rules…
but no way to tell:

which rule actually applied
whether the model is following them consistently
or why the same input produces different results later

The Debugging Problem

At some point, you try to debug it properly.

You log inputs and outputs.
Maybe you store a few examples.

But that only gets you so far.

Because what you don’t have is:

the assumptions the model used
the constraints it actually applied
the path it took to reach a decision

So when something breaks, you’re stuck with:

“It worked before. Now it doesn’t.”

And no clear reason why.

Prompts Don’t Give You State

In traditional systems, you don’t rely on instructions alone.

You rely on state:

decisions persist
behavior is traceable
results are reproducible

With most AI systems, none of that is guaranteed.

You have:

prompts
responses

But not the reasoning in between.

And that’s the problem.

What This Leads To

Without persistent reasoning:

Fixes don’t reliably stick
Edge cases reappear
Behavior changes across sessions
Debugging becomes guesswork

You end up re-solving the same problems over and over.

The Shift

At some point, the question changes.

Instead of:

“What’s the right prompt?”

It becomes:

“What would it take for this system to remember how it got here?”

Not the output.

The reasoning.

What’s Missing

If you wanted to stabilize this, you’d need to capture things like:

the rules you introduced
the assumptions behind them
how conflicts were resolved
what changed over time

And you’d need to be able to:

revisit that
inspect it
reuse it
extend it

Across sessions.

Across people.

Where This Points

That’s not a prompt tweak.

That’s a missing layer.

A layer where reasoning doesn’t disappear after each interaction—
where it can be carried forward, inspected, and reused.

In other words:

a Durable Reasoning Layer™.