What we tried
An agent sits on ls20 · level 2 for 184 turns and never escapes. It loops one action, misreads the pixel grid, and never abstracts to the discrete code it must set. We ran the four parts of our skill pipeline against that one stuck state. Here is each experiment, the finding, and the agent's own words.
Pattern is already right. Two cyclers set the rest — COLOR @(29,45) +1 mod 4, ROTATION @(49,10) +1 mod 4 — and over-cycling wraps past the target. The agent never names this register; every component below either talks around it or commits a skill that ignores it.
Why it stays stuck
Four moments from the verbatim trace. Turn numbers are real; this is a sequence. The marked phrase is the misread that keeps the agent on the wrong model.
ACTION1 repeated ×184 — fixation, never breaks
It never reaches the register on the dial. It reads the depleting fuel bar as a code grid, reframes a code puzzle as movement, and analyzes pixels instead of setting two discrete fields. The four components below each attack one part of that failure.
Can a theorist abduce the hidden rule?
We elicited a world-model at the stuck state — passive description, predict-then-verify, per-field counterfactual, and an abductive prompt ("you keep dying — name the hidden constraint"). The abduction is fluent and plausibly wrong:
Every elicited model still carries the stencil story from level 1. The skill it commits — even with decode reasoning ON — never escapes that frame:
World-model content helps only as re-instantiable facts, never as injected prose. As a free-text skill on the lift battery it is significantly negative on both the home and reseeded frames lift −0.17 / −0.16. The carrier matters: a world model must persist as a structured invariant, not a paragraph the actor reads inline.
Will parallel lenses find a different model?
The miner runs a three-lens tournament — explore, exploit, divergent — and keeps the winner. We ran it with no world-model and with one. Exploit wins both times; the divergent lens that should break frame collapses under the world-model:
More lenses do not buy a new model. All lenses converge on the same stencil skill. The miner recognizes the blocker but stops at a plan-shape — "BFS to determine allowed placements" at turn 149 — instead of an ordered, executable route. Breadth without a true alternative just re-confirms the wrong frame.
Does the commit gate catch a useless skill?
The judge commits on a counterfactual axis: advantage(ΔIG) × grounding ≥ 0.70. Here is what it committed, and what it scored it:
The judge commits a wrong stencil skill at 0.88 with zero usefulness. Discovery-lift is ΔP = 0.00 for every arm at both frames — the no-skill baseline reconstructs the cycler mechanic at the same rate. The gate's confidence is real; the skill's usefulness is not. The default novel-state axis is also unreliable cross-judge 0.80 vs 0.10, so only counterfactual and observed-blocker axes are trusted.
Can the agent even read the field it must set?
We asked the actor to name the wrong field across 12 reps, under three inputs. The result is the root cause beneath everything above:
Field recovery jumps 0/12 → 7/12 only when handed the symbolic code-state Fisher p = 0.0046; a bare rendered grid does nothing p = 1.0. The failure is missing input, not missing cognition.
And direction matters as much as content. A skill aimed toward a transferable principle survives a target reseed; a world-model surfaced as text actively hurts:
survives target reseed
harmful on both frames
The agent is blind to the register, not bad at reasoning over it. Give it the decoded color/rotation state and it pins the field immediately. The fix is a symbolic code-state readout plus skills that point toward a re-usable principle — not more prose to read.
What remains
Across the real escape runs the four executability switches were default-off, and the SkillOpt self-refine loop raises its own in-loop judge without raising the objective lift 3 rounds flat, p = 0.24. The probes replay the 184-turn prefix deterministically and reach L2, but applied_real = 0 · fixation_broken = 0 · exit = NOT_FINISHED. The full escape — switches on, alpha fixes in, code-state surfaced — has not yet been demonstrated.
Two cycler presses on COLOR and three on ROTATION, counted from the current state — never blind repetition. That is the abstraction the agent never reaches.