~/chadacus.dev/ecosystem-update/2026-05-12

Ecosystem Update - 2026-05-12

May 12, 2026 · generated by the ecosystem-update Claude Skill

TL;DR

  • Auto-implemented one safe harness Quick Win: upgraded the local Codex CLI from 0.128.0 to the latest stable 0.130.0; rejected today's 0.131.0-alpha.* builds.
  • The strongest new signal is evaluation quality for real runtimes: WildClawBench, constraint-drift safety, and agentic fuzzing all point toward bounded harness evals rather than new orchestration.
  • The current setup already has the high-value primitives: GPT-5.5, live web search, hooks, omni-mem lifecycle hooks, read-only reviewer agents, plugin support, OpenAI docs MCP, and Browser/Gmail/Documents/Presentations/Spreadsheets plugins.

Quick Wins

Item Source Type Impact Effort Action
Stable Codex 0.130.0 upgrade and smoke https://github.com/openai/codex/releases/tag/rust-v0.130.0 Codex-md 3 1 Auto-upgrade from installed @openai/[email protected] to stable 0.130.0; verify CLI, config, hooks, and guard behavior

Auto-Implemented

  • Backed up config.toml, hooks.json, and all /Users/chadsimon/.codex/agents/*.toml to /Users/chadsimon/.codex/backups/2026-05-12/.
  • Upgraded the npm-installed Codex CLI package with npm install -g @openai/[email protected].
  • Verified codex --version now reports codex-cli 0.130.0.
  • Verified /Users/chadsimon/.codex/hooks.json parses with python3 -m json.tool.
  • Verified /Users/chadsimon/.codex/config.toml and all agent TOMLs parse with Python tomllib.
  • Smoke-tested the existing Bash safety hook with a benign command payload; the live guard also blocked a destructive git reset --hard probe before execution, as intended.

Build Queue

  • WildClawBench-style native runtime eval intake (research) - https://arxiv.org/abs/2605.10912 - Add a small benchmark packet type for long-horizon, native-runtime tasks only if it can reuse the existing auto/task-eval harness rather than adding a new benchmark service.
  • Constraint-drift regression check (research) - https://arxiv.org/abs/2605.10481 - Convert the paper's safety-maintenance framing into a lightweight R3/R4 review rubric for scope leakage, authority drift, and missing evidence across subagent messages.
  • Agentic fuzzing spike for bug-miner (research) - https://arxiv.org/abs/2605.10074 - Evaluate whether the existing bug-miner skill can seed historical bug classes into bounded repro tasks before adding any new fuzzing scripts.
  • Pi-Serini lexical retrieval baseline (research) - https://arxiv.org/abs/2605.10848 - Compare rg/BM25-style retrieval against omni-mem/semantic retrieval for deep-research tasks before assuming heavier RAG is useful.
  • Codex 0.131 stable release watch (Codex-md) - https://github.com/openai/codex/releases - Today's 0.131.0-alpha.6 through 0.131.0-alpha.9 releases are active, but not stable; revisit once a non-prerelease tag lands.
  • Plugin-hook behavior review after 0.130 (hook) - https://github.com/openai/codex/releases/tag/rust-v0.130.0 - Plugin details now show bundled hooks, but automatic plugin-hook loading is still a separate risk surface; review before enabling plugin_hooks or trusting marketplace hooks.
  • Configurable OTEL trace metadata pass (mcp) - https://github.com/openai/codex/releases/tag/rust-v0.130.0 - 0.130 adds richer OTEL metadata support; map it to the existing local OTEL endpoints only if a concrete debugging workflow needs it.

Research

Already Have

Codex-owned AGENTS.md contract, model = "gpt-5.5", review_model = "gpt-5.4", approval_policy = "never", sandbox_mode = "danger-full-access", prompt telemetry off, web_search = "live", codex_hooks = true, goals = true, plugin support, OpenAI developer docs MCP, supports_parallel_tool_calls = true for the docs MCP, omni-mem MCP and lifecycle hooks, SessionStart cached repo-context hook, Stop omni-mem save hook, PreCompact omni-mem hook, Bash PreToolUse safety guard, Bash PostToolUse verification and failure-context hooks, read-only explorer/planner/reviewer/validator agents, Python and TypeScript reviewer agents, workspace-write worker and chad-twin agents, agent concurrency caps, official/bundled plugin marketplaces, Browser/Gmail/Documents/Presentations/Spreadsheets plugins, skill-audit, session-recall, auto, drive, govern, planning-gate, rlm-scan, memory-adaptation, npm-managed Codex CLI, current backup discipline, and prior ecosystem state dedupe.

Rejected

  • Upgrade to 0.131.0-alpha.9 - rejected: it is a prerelease published today; stable 0.130.0 is the safe target for the harness.
  • Enable plugin_hooks blindly - rejected: plugin hook visibility in 0.130 is useful, but automatic hook execution from plugins needs an explicit trust review.
  • Wholesale import from awesome-claude-code or oh-my-skills - rejected: useful cross-agent ideas must be copied or rewritten into Codex-owned skills after audit, not installed wholesale.
  • Native Codex memories as an immediate replacement - rejected: the current contract makes omni-mem the default memory system; native memories remain a pilot decision.
  • Add new daemon/orchestration layers for ecosystem crawling - rejected: WebFetch/WebSearch plus the existing report/state file are sufficient for the daily loop.
  • Policy doc edits as Quick Wins - rejected: ~/.codex/AGENTS.md and /Users/chadsimon/AGENTS.md are constitutional policy docs and require explicit direction.
  • Deploying the website - rejected per user instruction; the wrapper will render and deploy after this run finishes.

Sources checked: https://github.com/hesreallyhim/awesome-claude-code, https://howborisusesclaudecode.com/, https://github.com/shanraisshan/codex-cli-best-practice, https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-hooks.md, https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-subagents.md, https://developers.openai.com/codex/config-reference, https://developers.openai.com/codex/subagents, https://github.com/openai/codex/releases, https://github.com/openai/codex/releases/tag/rust-v0.130.0, https://arxiv.org/search/?searchtype=all&query=LLM+agent+coding&order=-announced_date_first, web search: "Codex new hooks agents skills site:github.com 2026", web search: "arxiv.org LLM agent coding autonomous 2026 site:arxiv.org" Tier 2 fetched: yes Tier 3 fetched: no - skipped because the last Tier 3 run was 2026-05-08T15:37:21Z, inside the 7-day window omni-mem: available; run summary saved Run at: 2026-05-12T10:31:21Z

// archive

← back to all digests