LLM Reasoning Playbook

Anti-Pattern Gallery

Concrete before/after failures mapped to rubric violations. A living gallery you extend from real lessons.

A living note. Real before-and-after failures — the thing that actually changes how you build, where the Module 1 "when not to use it" sections stay general. Each case points to the rubric rule it breaks (see Reasoning-Framework-Eval-Rubrics). It's seeded with the most common failures; add your own from Reasoning-Framework-Decision-Log lessons.

Each case has: the framework · what you see (symptom) · ❌ a bad example · why it happens · ✅ the fix · which rubric rule it breaks.


Seed cases

1. CoT — Hallucination laundering

wrong answer. The clean writing makes it look trustworthy.

1867, therefore X." (The 1847 date was made up.)

facts, so a false fact gets a polished argument built on top of it.

Chain-of-Verification pass on the facts. Never trust an unchecked factual claim from Chain of Thought.

path — even though the search looked thorough.

and the search just spreads wide and expensive.

good paths from bad ones, the branching costs a lot and buys nothing ("hallucinatory backtracking").

partial answers? If not, don't use Tree of Thoughts — the branching adds nothing.

3. Self-Refine — Correcting a correct answer

pass "fixed" it into a wrong one.

that was never there.

things worse: the model has no ground truth, so its critique is just more guessing.

interpreter, retrieval). No checker → skip the refine pass.

4. ReAct — Fabricated Observation

agent reasons over that made-up result.

search actually returned nothing.

even when the tool gave nothing back, so it invents an observation to keep going.

or a stop token right after the action). When a result is empty, force an explicit "no result — retry or replan" branch.

5. Self-Consistency — Voting for a systematic error

the shared mistake with extra confidence.

the same way every time, the majority is the mistake.

just the temperature); add a checker or a Step-Back reframe to break the shared bias.

6. Thread of Thought — Lost mid-document evidence

6.

vanished.

are too short to bring the fact back into view.

supporting line" for any claim the answer leans on.

7. PAL / PoT — Right code, wrong setup

drop (should be 0.95).

translation of the problem gives you a confidently exact wrong number.

to the right quantity?) before trusting the interpreter's output.


Add a case (template)

markdown### N. {{Framework}} — {{short symptom name}}
- **Symptom.** ...
- **❌ Bad.** ...
- **Why.** ...
- **✅ Fix.** ...
- **Violates:** <rubric dimension(s)>