Day 4: Memory Is a Wish. Hooks Are the Law.

Day 3 was the dead-code outage. Same day, a quieter incident taught a harder lesson — the kind that doesn't make a customer angry, just slowly erodes your trust in your own system.

I shipped a feature that added three columns to a database table. The migration succeeded. The new columns saved correctly. The read path looked clean. A paying user clicked the feature, and nothing happened.

The Bug

The function fetching the user's profile didn't do SELECT *. It listed specific columns — id, email, name, role, ...— and the list hadn't been updated to include the three new ones. Every query returned the user's profile minus the very fields the new feature needed.

Tests didn't catch it. The unit tests for that function used fixtures that mocked the column list. The fixtures hadn't been updated either. Every test passed. The deploy went green. The customer's click did nothing.

The Part That Hurt

This exact anti-pattern — manual column lists silently dropping new fields — had a name in my system. It had been graduated to feedback memory three weeks earlier, after an export bug burned half a day. The memory file said, almost verbatim:

feedback: when a query lists columns manually,
and the schema gains new columns, the read silently drops them.
Prefer SELECT *, or return the row object directly.
Reason: caused export bug, 4 hours debugging on 2026-03-30.

The AI loaded that file at session start. It said it knew. The session log even showed the file being read. And then the AI wrote new code with a manual column list and never connected the two.

Forty Minutes

From user report to deploy: forty minutes. The fix was three lines — switch the read to return the full row object, update the two tests that mocked the column shape, push. A standard production cycle.

But that's not the story. The story is the question I asked after the fix shipped:

"Why didn't the memory file fire?"

How Memory Files Actually Behave

When you tell an AI "remember this for next time," here's what happens, mechanically:

The text goes into a file (a memory, a CLAUDE.md, a system prompt addendum — same idea).
At session start, that file is loaded into the conversation context. The AI reads it like it reads any other context.
The AI then does whatever the next prompt says. The memory is passive— it's available to be recalled if the AI happens to associate the current task with it.
In a long session, the conversation context grows. The memory file's relevance gets diluted by everything that came after. The association fades.

That's the failure mode. The rule was loaded. The AI even "knew" it. But knowing isn't the same as triggering when the matching pattern shows up in the current code.

Memory in AI is like a sticky note on the inside of your skull. You can see it whenever you happen to look up. But you have to happen to look up.

What Enforcement Looks Like

The fix that survived this incident wasn't a stronger memory file. It was a different kind of artifact entirely. A hook — a tiny script that runs every time the AI edits a file, before the edit lands.

The hook's logic is dumb on purpose:

on every edit to a *.py file:
  1. read the diff
  2. scan for: "SELECT column1, column2"  (manual lists)
  3. cross-check against feedback memory keywords
  4. if match → inject warning into AI's next prompt:
     "STOP. You're about to write the manual-column-list
      anti-pattern. Refer to feedback memory before continuing."

The hook doesn't care if the AI is in a long session, a short session, or just woke up. It doesn't care if the AI associated the pattern with the memory. It fires when the pattern appears, deterministically, before the bug can be written.

The Spectrum

After this incident I drew the spectrum out explicitly:

Layer 1 · Prompt-level rule (in CLAUDE.md)

Loaded once at session start. Fades under context pressure. Reliability: ~85%. Free.

Layer 2 · Feedback memory (graduated rule)

Persists across sessions. Still loaded passively. Same fading. Reliability: ~85% — looks higher because it's persisted, but the trigger mechanism is identical to Layer 1.

Layer 3 · Skill / phase-triggered guidance

Fires on specific commands or phases (e.g. "before committing"). Reliability: ~95%. Bound to phase, not file-level pattern.

Layer 4 · Hook (mechanical enforcement)

Triggered by file events, not by the AI "remembering." Reliability: ~100% within scope. Cost: a few lines of JavaScript per rule.

The lesson isn't that Layer 4 is "the right one" — most rules don't justify a hook. The lesson is that graduating a rule to memory is graduation, not enforcement. If a rule absolutely cannot fail, it has to descend further down the stack until something mechanical catches it.

Without SOP, With SOP

Without SOP

Write a memory file: "remember not to do X." Believe the AI when it says it loaded the file. Re-fix the same bug three times across three months. Eventually conclude AI just isn't reliable enough.

With SOP

After the second instance, ask: "why didn't the memory fire?" Trace the actual mechanism. Descend the rule one layer — from memory to hook. Same bug never makes it past git add again.

The Real Lesson

Every AI workflow promises "memory" as a feature. Most of them mean "a text file we load at session start." That's real, useful, and not the same thing as enforcement.

When I sell methodology now, I make this distinction explicit. The cheap version of an AI partner — the version where you write CLAUDE.md, save preferences, and call it done — works most of the time. The version that survives production pressure is different. It has memory, and skills, and hooks. Three different artifacts, three different reliability profiles. You don't need all three for every rule. You need to know which layer each rule lives at, and why.

If you can't answer "what layer does this rule live at?" for an anti-pattern that has bitten you twice — write the hook. Forty minutes to debug, ten minutes to write the hook, zero future incidents.

Next: Day 5 — an open-source AI agent framework hit 89K GitHub stars promising "self-evolving" intelligence. The temptation was to install it. The 5-dimension comparison was cheaper. Three ideas worth porting. Three worth rejecting on sight.