Ecosystem Update - 2026-05-19

May 19, 2026 · curated by Chad Simon · 24 items reviewed

Highlights

Official Codex 0.131.0 is now stable and was safely applied; local CLI now reports codex-cli 0.131.0
The highest-value research signal is scope control: "Overeager Coding Agents" directly targets out-of-scope agent actions under permissive permission models
Community source changes were quiet today

Quick Wins (implemented today)

Stable Codex 0.131.0 upgrade Codex-md

github.com/openai/codex

Upgrade from 0.130.0 to the stable 0.131.0 release and smoke-test the harness

New Tools, Skills & Patterns

Scope-expansion canary for permissive agent runs

arXiv

Overeager Coding Agents - Add a small OverEager-style task-eval to catch agents deleting, rewriting, or touching files outside the task envelope
Stable plugin-hooks trust review hook

developers.openai.com

Codex hooks docs, Codex 0.131.0 release - plugin_hooks is stable in 0.131.0 but should stay disabled until bundled hooks from enabled plugins are audited and trust-reviewed
Built-in doctor adapter hook

github.com/openai/codex

Codex 0.131.0 release - Fold codex doctor --json into the local runtime-doctor report so stale rollouts, terminal warnings, and update status are captured without replacing the Codex-owned doctor
Overdue rollout cleanup policy Codex-md

codex doctor local output - Built-in doctor reports 9,788 active rollout files using 7.08 GB; design a non-destructive cleanup/report-only mode before deleting runtime state
Reversa-style operational-spec intake skill

arXiv

Reversa paper, sandeco/reversa - Current rlm-scan maps architecture, but legacy-system conversion could benefit from traceable claims, confidence tags, and explicit gap preservation
Observation-contract evals

arXiv

ContractBench - Add deterministic byte-preservation and expiry-window tests for tool artifacts such as presigned URLs, OAuth state, and upload tokens
Runtime-structured retry accounting

arXiv

Runtime-Structured Task Decomposition - Compare current packet retry behavior against subtask-local retry accounting to reduce reruns after downstream validation failures
Skill boundary compiler spike skill

arXiv

SkillSmith - Evaluate whether frequently used skills can expose smaller boundary-guided runtime interfaces without adding another orchestration layer

Research Worth Reading

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

arXiv

- Directly relevant to approval_policy = "never" and broad filesystem access; suggests measuring inferred task boundaries, not only prompt-declared scope
Same Signal, Different Semantics

arXiv

- Warns that trajectory metrics can flip meaning across agent frameworks, so local evals should validate metrics per Codex route rather than importing generic rules
CommitDistill

arXiv

keep as design reference
Verify-Gated Completion as Admission Control

arXiv

- Reinforces current planning-gate closure discipline: completion claims should be admitted by read-only verification evidence, not assistant confidence
PROTEA

arXiv

- Useful UI pattern for multi-agent workflow debugging: score intermediate nodes, localize bottlenecks, then rerun targeted revisions
RoadmapBench

arXiv

- Good long-horizon benchmark shape for roadmap execution; median task size is far beyond normal Quick Win scope
ContractBench

arXiv

- Gives a concrete benchmark family for temporal validity and byte integrity of tool outputs
SkillSmith

arXiv

- Strong conceptual fit for reducing skill context overhead, but implementation should wait for a local skill-usage trace

Considered, Not Adopting

Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.

Upgrade to 0.132.0-alpha.1 — - rejected because it is a pre-release and global runtime policy forbids enabling under-development features globally without rollback notes and validation
Enable plugin_hooks automatically — - rejected even though it is stable in 0.131.0; plugin-bundled hooks are executable code and require trust review first
Enable network_proxy globally — - rejected because it remains experimental and would alter sandboxed networking posture without a current need
Enable mentions_v2 globally — - rejected because it remains experimental and the current @/mention workflow is not blocked
Wholesale install Reversa or Claude toolkit plugins — - rejected because external skills/hooks must be audited and adapted into Codex-owned surfaces before execution
Delete rollout/state files as a Quick Win — - rejected because cleanup touches runtime state; report-only measurement is safe, deletion needs an explicit cleanup policy
Edit AGENTS.md as a Quick Win — - rejected by the ecosystem-update hard limit; constitutional policy changes require explicit direction

Ecosystem Update - 2026-05-19

Highlights

Quick Wins (implemented today)

New Tools, Skills & Patterns

Research Worth Reading

Considered, Not Adopting

Sources Reviewed