Ecosystem Update - 2026-05-19

May 19, 2026 · generated by the ecosystem-update Claude Skill

TL;DR

Official Codex 0.131.0 is now stable and was safely applied; local CLI now reports codex-cli 0.131.0.
The highest-value research signal is scope control: "Overeager Coding Agents" directly targets out-of-scope agent actions under permissive permission models.
Community source changes were quiet today; most recommendations reinforce existing Codex setup: hooks, role-scoped agents, verification gates, omni-mem, and compact policy.

Quick Wins

Item	Source	Type	Impact	Effort	Action
Stable Codex `0.131.0` upgrade	openai/codex release, npm `@openai/codex`	Codex-md	3	1	Upgrade from `0.130.0` to the stable `0.131.0` release and smoke-test the harness.

Auto-Implemented

Backed up config.toml, hooks.json, and all current agent TOMLs under /Users/chadsimon/.codex/backups/2026-05-19/.
Ran codex update, which executed npm install -g @openai/codex; codex --version now reports codex-cli 0.131.0.
Verified config.toml parses with tomllib and hooks.json parses with python3 -m json.tool.
Verified python3 ~/.codex/bin/codex_config_posture.py --mode warn reports Codex config posture ok.
Verified codex --profile conservative-auto-review --version and feature listing load cleanly; plugin_hooks is now stable but remains disabled.
Ran python3 ~/.codex/bin/codex-runtime-doctor; it passed with one warning for 3 stale temp trusted project entries.
Ran the new built-in codex doctor; it confirmed runtime 0.131.0, npm install consistency, healthy state DBs, MCP config, provider reachability, and current update status. It exits non-zero in this non-interactive Codex tool environment because TERM=dumb and stdio are not a real terminal.
Targeted tests passed: python3 -m unittest test_codex_config_posture, python3 -m unittest test_codex_agentops_contract, and python3 -m unittest test_codex_task_manager.
timeout 45s python3 -m unittest test_auto_runtime_common timed out after partial progress, and broad unittest discovery also hung; recorded as pre-existing harness risk requiring follow-up.

Build Queue

Scope-expansion canary for permissive agent runs (research) - Overeager Coding Agents - Add a small OverEager-style task-eval to catch agents deleting, rewriting, or touching files outside the task envelope.
Stable plugin-hooks trust review (hook) - Codex hooks docs, Codex 0.131.0 release - plugin_hooks is stable in 0.131.0 but should stay disabled until bundled hooks from enabled plugins are audited and trust-reviewed.
Built-in doctor adapter (hook) - Codex 0.131.0 release - Fold codex doctor --json into the local runtime-doctor report so stale rollouts, terminal warnings, and update status are captured without replacing the Codex-owned doctor.
Overdue rollout cleanup policy (Codex-md) - codex doctor local output - Built-in doctor reports 9,788 active rollout files using 7.08 GB; design a non-destructive cleanup/report-only mode before deleting runtime state.
Reversa-style operational-spec intake (skill) - Reversa paper, sandeco/reversa - Current rlm-scan maps architecture, but legacy-system conversion could benefit from traceable claims, confidence tags, and explicit gap preservation.
Observation-contract evals (research) - ContractBench - Add deterministic byte-preservation and expiry-window tests for tool artifacts such as presigned URLs, OAuth state, and upload tokens.
Runtime-structured retry accounting (research) - Runtime-Structured Task Decomposition - Compare current packet retry behavior against subtask-local retry accounting to reduce reruns after downstream validation failures.
Skill boundary compiler spike (skill) - SkillSmith - Evaluate whether frequently used skills can expose smaller boundary-guided runtime interfaces without adding another orchestration layer.

Research

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks - Directly relevant to approval_policy = "never" and broad filesystem access; suggests measuring inferred task boundaries, not only prompt-declared scope.
Same Signal, Different Semantics - Warns that trajectory metrics can flip meaning across agent frameworks, so local evals should validate metrics per Codex route rather than importing generic rules.
CommitDistill - Local-only typed memory from git history overlaps with omni-mem/rlm-scan goals, but reported headline lift is weak; keep as design reference.
Verify-Gated Completion as Admission Control - Reinforces current planning-gate closure discipline: completion claims should be admitted by read-only verification evidence, not assistant confidence.
PROTEA - Useful UI pattern for multi-agent workflow debugging: score intermediate nodes, localize bottlenecks, then rerun targeted revisions.
RoadmapBench - Good long-horizon benchmark shape for roadmap execution; median task size is far beyond normal Quick Win scope.
ContractBench - Gives a concrete benchmark family for temporal validity and byte integrity of tool outputs.
SkillSmith - Strong conceptual fit for reducing skill context overhead, but implementation should wait for a local skill-usage trace.

Already Have

gpt-5.5 power-user default, approval_policy = "never", sandbox_mode = "danger-full-access", prompt telemetry off, live web search, schema-linked config.toml, features.hooks = true, features.plugins = true, features.goals = true, features.prevent_idle_sleep = true, features.plugin_hooks = false, destructive app tools disabled by default, OpenAI developer docs MCP, omni-mem MCP, Stitch and Kickstarter MCP entries, Browser/Chrome/Computer Use/Documents/Spreadsheets/Presentations/Gmail/OpenAI Developers plugins, Bash PreToolUse guard, Bash PostToolUse verification ledger and failure-context hooks, SessionStart repo-context and config-posture hooks, Stop and PreCompact omni-mem hooks, read-only explorer/planner/reviewer/python-reviewer/typescript-reviewer/validator agents, scoped worker and chad-twin agents, bounded agent thread/depth/runtime caps, profiles.review, profiles.conservative, profiles.conservative-auto-review, global placeholder pokegen disable, session-recall, rlm-scan, planning-gate, auto, drive, go, codex-security, security-audit, codex-runtime-doctor, what-would-chad-do, and stable codex-cli 0.131.0.

Rejected

Upgrade to 0.132.0-alpha.1 - rejected because it is a pre-release and global runtime policy forbids enabling under-development features globally without rollback notes and validation.
Enable plugin_hooks automatically - rejected even though it is stable in 0.131.0; plugin-bundled hooks are executable code and require trust review first.
Enable network_proxy globally - rejected because it remains experimental and would alter sandboxed networking posture without a current need.
Enable mentions_v2 globally - rejected because it remains experimental and the current @/mention workflow is not blocked.
Wholesale install Reversa or Claude toolkit plugins - rejected because external skills/hooks must be audited and adapted into Codex-owned surfaces before execution.
Delete rollout/state files as a Quick Win - rejected because cleanup touches runtime state; report-only measurement is safe, deletion needs an explicit cleanup policy.
Edit AGENTS.md as a Quick Win - rejected by the ecosystem-update hard limit; constitutional policy changes require explicit direction.

Sources checked: https://github.com/hesreallyhim/awesome-claude-code, https://howborisusesclaudecode.com/, https://github.com/shanraisshan/codex-cli-best-practice, https://github.com/rohitg00/awesome-claude-code-toolkit, https://developers.openai.com/codex/config-reference, https://developers.openai.com/codex/hooks, https://github.com/openai/codex/releases, https://github.com/openai/codex/releases/tag/rust-v0.131.0, https://www.npmjs.com/package/@openai/codex, https://arxiv.org/list/cs.SE/recent, https://arxiv.org/list/cs.AI/recent, https://arxiv.org/abs/2605.18583, https://arxiv.org/abs/2605.18684, https://arxiv.org/abs/2605.18332, https://arxiv.org/abs/2605.18284, https://arxiv.org/abs/2605.17998, https://arxiv.org/abs/2605.18032, https://arxiv.org/abs/2605.17281, https://arxiv.org/abs/2605.15846, https://arxiv.org/abs/2605.15425, https://arxiv.org/abs/2605.15215, https://github.com/sandeco/reversa Tier 2 fetched: yes. Tier 3 fetched: yes; explicit daily crawl requested. omni-mem write: saved memory 5ff6eb29-742e-4a97-85c8-9397c64c3d0f. Run at: 2026-05-19T10:39:57Z