Research log · ls20 · level 2 dead-run ls20_dream_opt · prefix 184 turns · replay-deterministic

What we are still checking

An agent stalls on one puzzle. It loops on a single action, misreads the pixel grid, and never reaches the discrete code it must set. We replay that stuck memory and ablate the pipeline to find out why — and what is left to fix.

The state the agent is blind to

Code register · current → required ✓ correct · blind never set

pattern 5→5

color ·→1

rotation ·→3

Ground truth: pattern 5 is already right. color needs the cycler at (29,45) +1 mod4 stepped to 1; rotation needs the cycler at (49,10) +1 mod4 stepped to 3. Over-cycling wraps past the target. The agent never represents these three numbers at all.

It reads the wrong picture

Level 2 is a 64×64 frame. To the right is a fuel bar that depletes as the agent moves, and a code-setter the agent calls a stencil. The agent treats the depleting fuel as a code grid, and the whole puzzle as movement.

ls20 · L2 · 64×64 · the frame at the point of fixation

agent reasoning · verbatim machine output7

each action takes up 2 yellow cells from left to right, and after 4 moves, the first 4 columns turn gray

agent reasoning · verbatim33

Investigating movement issues... bounding remains at x34, but maybe the second/right is blocked

agent reasoning · verbatim127

def pattern5(fr): ar=fr.grid_np ... rows.append(...) tries to read the lock from pixels

agent reasoning · verbatim149

Figuring out the reset issue... something burned... reset to the initial

ACTION4 repeated · fixation streak ×184

The trace runs 184 turns and never sets a discrete register. It loops on ACTION4, narrating the pixels instead of cycling the color/rotation activators.

What we are doing about it

Method

We replay the stuck memory and run short-turn ablations across the four components — World Model Miner Judge Use — varying options to find why it fails. Not full episodes.

What each component told us so far

World Model · theorist

The abductive hypothesis is plausible but wrong: "a single hidden constraint — the yellow checkpoint/refill tiles act like a reset trap that reverts the key and code state." With decode on, the committed skill finally names the real move: "count covers from the current state and stop on the first state that opens the lock; if the activator cycles, prefer the minimum additional covers." Still framed around the stencil, never the three-number register.

Miner · 3-lens tournament

Three lenses mine in parallel. explore 0.79 · exploit 0.88 wins · divergent 0.74. Under the world-model arm, explore collapses to 0.21 and exploit (0.86) carries. Every lens converges on the same stencil model — diversity does not break the misread.

Judge · counterfactual commit

combined = advantage(ΔIG) × grounding = 0.88 ≥ 0.7 → COMMIT. But discovery-lift ΔP = 0.00 for every arm: the no-skill baseline finds the cycler at the same rate. The judge commits a wrong stencil skill at 0.88 with zero usefulness. (Eval axis: novel-state cross-judge 0.80 vs 0.10.)

Use · observability (root)

The field is only legible when handed the decoded state: correct field 0/12 from text, 0/12 from raw grid, 7/12 from decoded state (Fisher p=0.0046). Skill direction matters: toward the principle helps (ΔP +0.36, survives target reseed); world-model-as-text is harmful (ΔP −0.17).

Options we are sweeping

Each component × each option, with status. The sweep is cheap: short-turn replays over the recorded prefix, not new episodes.

Component	Option	What it tests	Status
World Model	none / hypothesis / worldmodel	Does a theorist arm escape the stencil model, or just elaborate it?	done
World Model	decode ON / OFF	Does feeding the decoded register change the committed skill?	running
Miner	explore / exploit / divergent	Can lens diversity break convergence on one wrong model?	done
Miner	register-targeted lens	A lens that mines for the discrete code, not the pixels.	planned
Judge	grounding × advantage (0.7 bar)	Current bar: commits plausible skills at 0.88 with ΔP 0.	done
Judge	mechanism-correctness gate	Reward discovery-lift (ΔP), reject plausibility-only commits.	planned
Use	text / raw grid / decoded state	Which observation makes the code field legible? Decoded only.	done
Use	skill direction · toward / as-text	Toward-principle +0.36 vs world-model-as-text −0.17.	done
Use	decode primitive in the act loop	Expose (pattern,color,rotation) to the actor each turn.	running

Roadmap

Fix the judge Judge

Reward mechanism-correctness, not plausibility. The 0.88 / ΔP 0 commit is the canonical failure: a wrong stencil skill passes the bar while adding zero discovery-lift. Gate on ΔP.
Full-pipeline escape

With the decode primitive exposed and the judge gated, run the four components end-to-end from the stuck prefix and check whether the register reaches (5,1,3) and the level clears.
Carry findings forward

Feed what each ablation learned back into memory so the learning loop keeps the toward-principle skill and drops the world-model-as-text one.
Decode primitive to production

Promote the decoded-state observation from the ablation harness into the live act loop, so the agent never has to read the register off pixels again.

Deep dives

→ the dead run (636 steps) → vc33 seq-1 explorer → ft09 trace explorer

It reads the wrong picture

What we are doing about it

Options we are sweeping

Roadmap

Fix the judge Judge

Full-pipeline escape

Carry findings forward

Decode primitive to production