Research log · ls20 · level 2 dead-run ls20_dream_opt · prefix 184 turns · replay-deterministic

What we are still checking

An agent stalls on one puzzle. It loops on a single action, misreads the pixel grid, and never reaches the discrete code it must set. We replay that stuck memory and ablate the pipeline to find out why — and what is left to fix.

The state the agent is blind to
Code register · current → required correct   · blind never set
pattern 55
color ·1
rotation ·3

Ground truth: pattern 5 is already right. color needs the cycler at (29,45) +1 mod4 stepped to 1; rotation needs the cycler at (49,10) +1 mod4 stepped to 3. Over-cycling wraps past the target. The agent never represents these three numbers at all.


It reads the wrong picture

Level 2 is a 64×64 frame. To the right is a fuel bar that depletes as the agent moves, and a code-setter the agent calls a stencil. The agent treats the depleting fuel as a code grid, and the whole puzzle as movement.

ls20 · L2 · 64×64 · the frame at the point of fixation
agent reasoning · verbatim machine output7
each action takes up 2 yellow cells from left to right, and after 4 moves, the first 4 columns turn gray
agent reasoning · verbatim33
Investigating movement issues... bounding remains at x34, but maybe the second/right is blocked
agent reasoning · verbatim127
def pattern5(fr): ar=fr.grid_np ... rows.append(...) tries to read the lock from pixels
agent reasoning · verbatim149
Figuring out the reset issue... something burned... reset to the initial
ACTION4 repeated · fixation streak ×184

The trace runs 184 turns and never sets a discrete register. It loops on ACTION4, narrating the pixels instead of cycling the color/rotation activators.


What we are doing about it

Method

We replay the stuck memory and run short-turn ablations across the four components — World Model Miner Judge Use — varying options to find why it fails. Not full episodes.

What each component told us so far
World Model · theorist

The abductive hypothesis is plausible but wrong: "a single hidden constraint — the yellow checkpoint/refill tiles act like a reset trap that reverts the key and code state." With decode on, the committed skill finally names the real move: "count covers from the current state and stop on the first state that opens the lock; if the activator cycles, prefer the minimum additional covers." Still framed around the stencil, never the three-number register.

Miner · 3-lens tournament

Three lenses mine in parallel. explore 0.79 · exploit 0.88 wins · divergent 0.74. Under the world-model arm, explore collapses to 0.21 and exploit (0.86) carries. Every lens converges on the same stencil model — diversity does not break the misread.

Judge · counterfactual commit

combined = advantage(ΔIG) × grounding = 0.88 ≥ 0.7 → COMMIT. But discovery-lift ΔP = 0.00 for every arm: the no-skill baseline finds the cycler at the same rate. The judge commits a wrong stencil skill at 0.88 with zero usefulness. (Eval axis: novel-state cross-judge 0.80 vs 0.10.)

Use · observability (root)

The field is only legible when handed the decoded state: correct field 0/12 from text, 0/12 from raw grid, 7/12 from decoded state (Fisher p=0.0046). Skill direction matters: toward the principle helps (ΔP +0.36, survives target reseed); world-model-as-text is harmful (ΔP −0.17).


Options we are sweeping

Each component × each option, with status. The sweep is cheap: short-turn replays over the recorded prefix, not new episodes.

ComponentOptionWhat it testsStatus
World Model none / hypothesis / worldmodel Does a theorist arm escape the stencil model, or just elaborate it? done
World Model decode ON / OFF Does feeding the decoded register change the committed skill? running
Miner explore / exploit / divergent Can lens diversity break convergence on one wrong model? done
Miner register-targeted lens A lens that mines for the discrete code, not the pixels. planned
Judge grounding × advantage (0.7 bar) Current bar: commits plausible skills at 0.88 with ΔP 0. done
Judge mechanism-correctness gate Reward discovery-lift (ΔP), reject plausibility-only commits. planned
Use text / raw grid / decoded state Which observation makes the code field legible? Decoded only. done
Use skill direction · toward / as-text Toward-principle +0.36 vs world-model-as-text −0.17. done
Use decode primitive in the act loop Expose (pattern,color,rotation) to the actor each turn. running

Roadmap

  1. Fix the judge Judge

    Reward mechanism-correctness, not plausibility. The 0.88 / ΔP 0 commit is the canonical failure: a wrong stencil skill passes the bar while adding zero discovery-lift. Gate on ΔP.

  2. Full-pipeline escape

    With the decode primitive exposed and the judge gated, run the four components end-to-end from the stuck prefix and check whether the register reaches (5,1,3) and the level clears.

  3. Carry findings forward

    Feed what each ablation learned back into memory so the learning loop keeps the toward-principle skill and drops the world-model-as-text one.

  4. Decode primitive to production

    Promote the decoded-state observation from the ablation harness into the live act loop, so the agent never has to read the register off pixels again.

Deep dives