~/chadacus.dev/ecosystem-update/2026-05-19

Ecosystem Update - 2026-05-19

May 19, 2026 · generated by the ecosystem-update Claude Skill

TL;DR

  • Official Codex 0.131.0 is now stable and was safely applied; local CLI now reports codex-cli 0.131.0.
  • The highest-value research signal is scope control: "Overeager Coding Agents" directly targets out-of-scope agent actions under permissive permission models.
  • Community source changes were quiet today; most recommendations reinforce existing Codex setup: hooks, role-scoped agents, verification gates, omni-mem, and compact policy.

Quick Wins

Item Source Type Impact Effort Action
Stable Codex 0.131.0 upgrade openai/codex release, npm @openai/codex Codex-md 3 1 Upgrade from 0.130.0 to the stable 0.131.0 release and smoke-test the harness.

Auto-Implemented

  • Backed up config.toml, hooks.json, and all current agent TOMLs under /Users/chadsimon/.codex/backups/2026-05-19/.
  • Ran codex update, which executed npm install -g @openai/codex; codex --version now reports codex-cli 0.131.0.
  • Verified config.toml parses with tomllib and hooks.json parses with python3 -m json.tool.
  • Verified python3 ~/.codex/bin/codex_config_posture.py --mode warn reports Codex config posture ok.
  • Verified codex --profile conservative-auto-review --version and feature listing load cleanly; plugin_hooks is now stable but remains disabled.
  • Ran python3 ~/.codex/bin/codex-runtime-doctor; it passed with one warning for 3 stale temp trusted project entries.
  • Ran the new built-in codex doctor; it confirmed runtime 0.131.0, npm install consistency, healthy state DBs, MCP config, provider reachability, and current update status. It exits non-zero in this non-interactive Codex tool environment because TERM=dumb and stdio are not a real terminal.
  • Targeted tests passed: python3 -m unittest test_codex_config_posture, python3 -m unittest test_codex_agentops_contract, and python3 -m unittest test_codex_task_manager.
  • timeout 45s python3 -m unittest test_auto_runtime_common timed out after partial progress, and broad unittest discovery also hung; recorded as pre-existing harness risk requiring follow-up.

Build Queue

  • Scope-expansion canary for permissive agent runs (research) - Overeager Coding Agents - Add a small OverEager-style task-eval to catch agents deleting, rewriting, or touching files outside the task envelope.
  • Stable plugin-hooks trust review (hook) - Codex hooks docs, Codex 0.131.0 release - plugin_hooks is stable in 0.131.0 but should stay disabled until bundled hooks from enabled plugins are audited and trust-reviewed.
  • Built-in doctor adapter (hook) - Codex 0.131.0 release - Fold codex doctor --json into the local runtime-doctor report so stale rollouts, terminal warnings, and update status are captured without replacing the Codex-owned doctor.
  • Overdue rollout cleanup policy (Codex-md) - codex doctor local output - Built-in doctor reports 9,788 active rollout files using 7.08 GB; design a non-destructive cleanup/report-only mode before deleting runtime state.
  • Reversa-style operational-spec intake (skill) - Reversa paper, sandeco/reversa - Current rlm-scan maps architecture, but legacy-system conversion could benefit from traceable claims, confidence tags, and explicit gap preservation.
  • Observation-contract evals (research) - ContractBench - Add deterministic byte-preservation and expiry-window tests for tool artifacts such as presigned URLs, OAuth state, and upload tokens.
  • Runtime-structured retry accounting (research) - Runtime-Structured Task Decomposition - Compare current packet retry behavior against subtask-local retry accounting to reduce reruns after downstream validation failures.
  • Skill boundary compiler spike (skill) - SkillSmith - Evaluate whether frequently used skills can expose smaller boundary-guided runtime interfaces without adding another orchestration layer.

Research

  • Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks - Directly relevant to approval_policy = "never" and broad filesystem access; suggests measuring inferred task boundaries, not only prompt-declared scope.
  • Same Signal, Different Semantics - Warns that trajectory metrics can flip meaning across agent frameworks, so local evals should validate metrics per Codex route rather than importing generic rules.
  • CommitDistill - Local-only typed memory from git history overlaps with omni-mem/rlm-scan goals, but reported headline lift is weak; keep as design reference.
  • Verify-Gated Completion as Admission Control - Reinforces current planning-gate closure discipline: completion claims should be admitted by read-only verification evidence, not assistant confidence.
  • PROTEA - Useful UI pattern for multi-agent workflow debugging: score intermediate nodes, localize bottlenecks, then rerun targeted revisions.
  • RoadmapBench - Good long-horizon benchmark shape for roadmap execution; median task size is far beyond normal Quick Win scope.
  • ContractBench - Gives a concrete benchmark family for temporal validity and byte integrity of tool outputs.
  • SkillSmith - Strong conceptual fit for reducing skill context overhead, but implementation should wait for a local skill-usage trace.

Already Have

gpt-5.5 power-user default, approval_policy = "never", sandbox_mode = "danger-full-access", prompt telemetry off, live web search, schema-linked config.toml, features.hooks = true, features.plugins = true, features.goals = true, features.prevent_idle_sleep = true, features.plugin_hooks = false, destructive app tools disabled by default, OpenAI developer docs MCP, omni-mem MCP, Stitch and Kickstarter MCP entries, Browser/Chrome/Computer Use/Documents/Spreadsheets/Presentations/Gmail/OpenAI Developers plugins, Bash PreToolUse guard, Bash PostToolUse verification ledger and failure-context hooks, SessionStart repo-context and config-posture hooks, Stop and PreCompact omni-mem hooks, read-only explorer/planner/reviewer/python-reviewer/typescript-reviewer/validator agents, scoped worker and chad-twin agents, bounded agent thread/depth/runtime caps, profiles.review, profiles.conservative, profiles.conservative-auto-review, global placeholder pokegen disable, session-recall, rlm-scan, planning-gate, auto, drive, go, codex-security, security-audit, codex-runtime-doctor, what-would-chad-do, and stable codex-cli 0.131.0.

Rejected

  • Upgrade to 0.132.0-alpha.1 - rejected because it is a pre-release and global runtime policy forbids enabling under-development features globally without rollback notes and validation.
  • Enable plugin_hooks automatically - rejected even though it is stable in 0.131.0; plugin-bundled hooks are executable code and require trust review first.
  • Enable network_proxy globally - rejected because it remains experimental and would alter sandboxed networking posture without a current need.
  • Enable mentions_v2 globally - rejected because it remains experimental and the current @/mention workflow is not blocked.
  • Wholesale install Reversa or Claude toolkit plugins - rejected because external skills/hooks must be audited and adapted into Codex-owned surfaces before execution.
  • Delete rollout/state files as a Quick Win - rejected because cleanup touches runtime state; report-only measurement is safe, deletion needs an explicit cleanup policy.
  • Edit AGENTS.md as a Quick Win - rejected by the ecosystem-update hard limit; constitutional policy changes require explicit direction.

Sources checked: https://github.com/hesreallyhim/awesome-claude-code, https://howborisusesclaudecode.com/, https://github.com/shanraisshan/codex-cli-best-practice, https://github.com/rohitg00/awesome-claude-code-toolkit, https://developers.openai.com/codex/config-reference, https://developers.openai.com/codex/hooks, https://github.com/openai/codex/releases, https://github.com/openai/codex/releases/tag/rust-v0.131.0, https://www.npmjs.com/package/@openai/codex, https://arxiv.org/list/cs.SE/recent, https://arxiv.org/list/cs.AI/recent, https://arxiv.org/abs/2605.18583, https://arxiv.org/abs/2605.18684, https://arxiv.org/abs/2605.18332, https://arxiv.org/abs/2605.18284, https://arxiv.org/abs/2605.17998, https://arxiv.org/abs/2605.18032, https://arxiv.org/abs/2605.17281, https://arxiv.org/abs/2605.15846, https://arxiv.org/abs/2605.15425, https://arxiv.org/abs/2605.15215, https://github.com/sandeco/reversa Tier 2 fetched: yes. Tier 3 fetched: yes; explicit daily crawl requested. omni-mem write: saved memory 5ff6eb29-742e-4a97-85c8-9397c64c3d0f. Run at: 2026-05-19T10:39:57Z

// archive

← back to all digests