Ecosystem Update - 2026-05-15
TL;DR
- One safe harness Quick Win was implemented:
features.codex_hookswas renamed to the canonicalfeatures.hooksin~/.codex/config.toml. - Today's Codex hook signal is about missing upstream lifecycle surfaces, especially
ProviderError/TurnError; local implementation has to wait for a real event. - Today's arXiv signal reinforces the existing runtime posture: prefer direct corpus interaction with
rg, evaluate long-horizon adaptation explicitly, and use pairwise verifier aggregation before trusting parallel agent output.
Quick Wins
| Item | Source | Type | Impact | Effort | Action |
|---|---|---|---|---|---|
| Canonical hooks feature flag | OpenAI config reference, openai/codex#22148 | hook | 2 | 1 | Replace deprecated features.codex_hooks = true with features.hooks = true in ~/.codex/config.toml. |
Auto-Implemented
- Updated
/Users/chadsimon/.codex/config.tomlto use[features].hooks = true. - Backed up
config.toml,hooks.json, and current agent TOMLs under/Users/chadsimon/.codex/backups/2026-05-15/. - Verified
config.tomlparses withtomllib,hooks.jsonparses withpython3 -m json.tool,codex --versionreportscodex-cli 0.130.0, andpython3 ~/.codex/bin/codex-runtime-doctorexits 0.
Build Queue
- ProviderError / TurnError hook watcher (hook) - openai/codex#22774 - Track the upstream proposal for provider/API failure hooks, then wire recovery/notification/backoff only after Codex emits a stable event. Current runtime cannot implement this safely because no such hook event exists.
- Plugin hooks trust review gate (hook) - OpenAI plugin hooks docs - Before enabling
features.plugin_hooks, audit installed plugin hook manifests and commands. This is useful, but not a Quick Win because plugin hooks can execute bundled commands from marketplace/plugin packages. - FutureSim-style autonomy eval intake (research) - FutureSim - Add a lightweight evaluation story for long-horizon adaptation after knowledge cutoff, tied to the existing task-eval/autonomy harness rather than a new service.
- Bradley-Terry parallel candidate selector spike (research) - OpenDeepThink - Evaluate pairwise ranking of parallel agent outputs for objectively testable tasks before adopting as a verifier pattern.
Research
- Is Grep All You Need? How Agent Harnesses Reshape Agentic Search - Directly supports the current
rg-first search posture and warns that harness/tool-output presentation changes retrieval quality. - FutureSim: Replaying World Events to Evaluate Adaptive Agents - Useful benchmark pattern for testing whether autonomous runtime changes improve adaptation over time instead of only static task completion.
- OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation - Candidate verifier pattern for ranking parallel agent outputs with pairwise comparisons instead of a single noisy pointwise judge.
- Articraft: An Agentic System for Scalable Articulated 3D Asset Generation - Domain-specific, but its restricted workspace, domain SDK, tests, and structured feedback loop map cleanly to harness design.
Already Have
gpt-5.5 power-user default, approval_policy = "never", sandbox_mode = "danger-full-access", prompt telemetry off, canonical hooks = true, Bash PreToolUse safety guard, Bash PostToolUse verification ledger, Bash failure-context hook, SessionStart cached repo-context preflight, Stop omni-mem save hook, PreCompact omni-mem hook, OpenAI developer docs MCP, omni-mem MCP, Stitch MCP, Browser plugin, Computer Use plugin, Documents plugin, Spreadsheets plugin, Presentations plugin, Gmail plugin, read-only reviewer agents, read-only explorer/planner/validator agents, bounded agent max_threads = 3, what-would-chad-do, go, drive, auto, planning-gate, codex-security, security-audit, rlm-scan, session-recall, skill-audit, build-backlog, skill-creator, skill-installer, direct rg/file-search posture, and runtime doctor coverage.
Rejected
- Enable plugin hooks blindly - rejected because plugin hooks execute bundled lifecycle commands; requires a trust review gate and explicit enablement.
- Add new community hook scripts - rejected by the skill hard limit: no new hook script can be wired unless the script already exists and the target file is explicitly identified.
- Wholesale import from Claude/CodeAlive/awesome skill catalogs - rejected because Codex-owned runtime surfaces must not depend on Claude-owned layouts or unaudited outside skills.
- Native Codex memories flip-on - rejected as a Quick Win because the current runtime intentionally uses omni-mem; native memories remains a pilot decision, not a default switch.
- UserPromptSubmit prompt telemetry hooks - rejected because global prompt telemetry is opt-in and the current policy forbids logging user prompts by default.
- SessionEnd workaround hook - rejected because the upstream issue was closed as not planned and local heuristics would be brittle.
- Default worktree isolation for all subagents - rejected because current agent fan-out is deliberately bounded; automatic worktree orchestration would add complexity without a proven recurring failure.
Sources checked: https://github.com/hesreallyhim/awesome-claude-code, https://howborisusesclaudecode.com/, https://github.com/shanraisshan/codex-cli-best-practice, https://arxiv.org/search/?searchtype=all&query=LLM+agent+coding&order=-announced_date_first, https://arxiv.org/api/query?search_query=all:agent%20AND%20all:grep&sortBy=submittedDate&sortOrder=descending&max_results=5, https://github.com/openai/codex/issues/22774, https://github.com/openai/codex/issues/22148, https://developers.openai.com/codex/config-reference, https://developers.openai.com/codex/config-schema.json Tier 2 fetched: yes Tier 3 fetched: no; last weekly run was 2026-05-08T15:37:21Z, under 7 days at run time. Targeted OpenAI config docs/schema were checked only to validate the Quick Win. Run at: 2026-05-15T10:34:01Z