~/chadacus.dev/ecosystem-update/2026-05-19

Ecosystem Update - 2026-05-19

May 19, 2026 · curated by Chad Simon · 24 items reviewed

Highlights

  • Official Codex 0.131.0 is now stable and was safely applied; local CLI now reports codex-cli 0.131.0
  • The highest-value research signal is scope control: "Overeager Coding Agents" directly targets out-of-scope agent actions under permissive permission models
  • Community source changes were quiet today

Quick Wins (implemented today)

  • Stable Codex 0.131.0 upgrade Codex-md
    Upgrade from 0.130.0 to the stable 0.131.0 release and smoke-test the harness

New Tools, Skills & Patterns

  • Scope-expansion canary for permissive agent runs
    Overeager Coding Agents - Add a small OverEager-style task-eval to catch agents deleting, rewriting, or touching files outside the task envelope
  • Stable plugin-hooks trust review hook
    Codex hooks docs, Codex 0.131.0 release - plugin_hooks is stable in 0.131.0 but should stay disabled until bundled hooks from enabled plugins are audited and trust-reviewed
  • Built-in doctor adapter hook
    Codex 0.131.0 release - Fold codex doctor --json into the local runtime-doctor report so stale rollouts, terminal warnings, and update status are captured without replacing the Codex-owned doctor
  • Overdue rollout cleanup policy Codex-md
    codex doctor local output - Built-in doctor reports 9,788 active rollout files using 7.08 GB; design a non-destructive cleanup/report-only mode before deleting runtime state
  • Reversa-style operational-spec intake skill
    Reversa paper, sandeco/reversa - Current rlm-scan maps architecture, but legacy-system conversion could benefit from traceable claims, confidence tags, and explicit gap preservation
  • Observation-contract evals
    ContractBench - Add deterministic byte-preservation and expiry-window tests for tool artifacts such as presigned URLs, OAuth state, and upload tokens
  • Runtime-structured retry accounting
    Runtime-Structured Task Decomposition - Compare current packet retry behavior against subtask-local retry accounting to reduce reruns after downstream validation failures
  • Skill boundary compiler spike skill
    SkillSmith - Evaluate whether frequently used skills can expose smaller boundary-guided runtime interfaces without adding another orchestration layer

Research Worth Reading

  • Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks
    - Directly relevant to approval_policy = "never" and broad filesystem access; suggests measuring inferred task boundaries, not only prompt-declared scope
  • Same Signal, Different Semantics
    - Warns that trajectory metrics can flip meaning across agent frameworks, so local evals should validate metrics per Codex route rather than importing generic rules
  • CommitDistill
    keep as design reference
  • Verify-Gated Completion as Admission Control
    - Reinforces current planning-gate closure discipline: completion claims should be admitted by read-only verification evidence, not assistant confidence
  • PROTEA
    - Useful UI pattern for multi-agent workflow debugging: score intermediate nodes, localize bottlenecks, then rerun targeted revisions
  • RoadmapBench
    - Good long-horizon benchmark shape for roadmap execution; median task size is far beyond normal Quick Win scope
  • ContractBench
    - Gives a concrete benchmark family for temporal validity and byte integrity of tool outputs
  • SkillSmith
    - Strong conceptual fit for reducing skill context overhead, but implementation should wait for a local skill-usage trace

Considered, Not Adopting

Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.

  • Upgrade to 0.132.0-alpha.1- rejected because it is a pre-release and global runtime policy forbids enabling under-development features globally without rollback notes and validation
  • Enable plugin_hooks automatically- rejected even though it is stable in 0.131.0; plugin-bundled hooks are executable code and require trust review first
  • Enable network_proxy globally- rejected because it remains experimental and would alter sandboxed networking posture without a current need
  • Enable mentions_v2 globally- rejected because it remains experimental and the current @/mention workflow is not blocked
  • Wholesale install Reversa or Claude toolkit plugins- rejected because external skills/hooks must be audited and adapted into Codex-owned surfaces before execution
  • Delete rollout/state files as a Quick Win- rejected because cleanup touches runtime state; report-only measurement is safe, deletion needs an explicit cleanup policy
  • Edit AGENTS.md as a Quick Win- rejected by the ecosystem-update hard limit; constitutional policy changes require explicit direction

Sources Reviewed

// archive

← back to all digests