Ecosystem Update - 2026-05-15
Highlights
- One safe harness Quick Win was implemented:
features.codex_hookswas renamed to the canonicalfeatures.hooksin~/.codex/config.toml - Today's Codex hook signal is about missing upstream lifecycle surfaces, especially
ProviderError/TurnError; local implementation has to wait for a real event - Today's arXiv signal reinforces the existing runtime posture: prefer direct corpus interaction with
rg, evaluate long-horizon adaptation explicitly, and use pairwise verifier aggregation before trusting parallel agent output
Quick Wins (implemented today)
-
Canonical hooks feature flag hookReplace deprecated
features.codex_hooks = truewithfeatures.hooks = truein~/.codex/config.toml
New Tools, Skills & Patterns
-
ProviderError / TurnError hook watcher hookopenai/codex#22774 - Track the upstream proposal for provider/API failure hooks, then wire recovery/notification/backoff only after Codex emits a stable event. Current runtime cannot implement this safely because no such hook event exists
-
Plugin hooks trust review gate hookOpenAI plugin hooks docs - Before enabling
features.plugin_hooks, audit installed plugin hook manifests and commands. This is useful, but not a Quick Win because plugin hooks can execute bundled commands from marketplace/plugin packages -
FutureSim-style autonomy eval intakeFutureSim - Add a lightweight evaluation story for long-horizon adaptation after knowledge cutoff, tied to the existing task-eval/autonomy harness rather than a new service
-
Bradley-Terry parallel candidate selector spikeOpenDeepThink - Evaluate pairwise ranking of parallel agent outputs for objectively testable tasks before adopting as a verifier pattern
Research Worth Reading
-
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search- Directly supports the current
rg-first search posture and warns that harness/tool-output presentation changes retrieval quality -
FutureSim: Replaying World Events to Evaluate Adaptive Agents- Useful benchmark pattern for testing whether autonomous runtime changes improve adaptation over time instead of only static task completion
-
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation- Candidate verifier pattern for ranking parallel agent outputs with pairwise comparisons instead of a single noisy pointwise judge
-
Articraft: An Agentic System for Scalable Articulated 3D Asset Generation- Domain-specific, but its restricted workspace, domain SDK, tests, and structured feedback loop map cleanly to harness design
Considered, Not Adopting
Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.
- Enable plugin hooks blindly — - rejected because plugin hooks execute bundled lifecycle commands; requires a trust review gate and explicit enablement
- Add new community hook scripts — - rejected by the skill hard limit: no new hook script can be wired unless the script already exists and the target file is explicitly identified
- Wholesale import from Claude/CodeAlive/awesome skill catalogs — - rejected because Codex-owned runtime surfaces must not depend on Claude-owned layouts or unaudited outside skills
- Native Codex memories flip-on — native memories remains a pilot decision, not a default switch
- UserPromptSubmit prompt telemetry hooks — - rejected because global prompt telemetry is opt-in and the current policy forbids logging user prompts by default
- SessionEnd workaround hook — - rejected because the upstream issue was closed as not planned and local heuristics would be brittle
- Default worktree isolation for all subagents — - rejected because current agent fan-out is deliberately bounded; automatic worktree orchestration would add complexity without a proven recurring failure