Ecosystem Update - 2026-05-15

May 15, 2026 · curated by Chad Simon · 16 items reviewed

Highlights

One safe harness Quick Win was implemented: features.codex_hooks was renamed to the canonical features.hooks in ~/.codex/config.toml
Today's Codex hook signal is about missing upstream lifecycle surfaces, especially ProviderError / TurnError; local implementation has to wait for a real event
Today's arXiv signal reinforces the existing runtime posture: prefer direct corpus interaction with rg, evaluate long-horizon adaptation explicitly, and use pairwise verifier aggregation before trusting parallel agent output

Canonical hooks feature flag hook

developers.openai.com

Replace deprecated features.codex_hooks = true with features.hooks = true in ~/.codex/config.toml

ProviderError / TurnError hook watcher hook

github.com/openai/codex

openai/codex#22774 - Track the upstream proposal for provider/API failure hooks, then wire recovery/notification/backoff only after Codex emits a stable event. Current runtime cannot implement this safely because no such hook event exists
Plugin hooks trust review gate hook

developers.openai.com

OpenAI plugin hooks docs - Before enabling features.plugin_hooks, audit installed plugin hook manifests and commands. This is useful, but not a Quick Win because plugin hooks can execute bundled commands from marketplace/plugin packages
FutureSim-style autonomy eval intake

arXiv

FutureSim - Add a lightweight evaluation story for long-horizon adaptation after knowledge cutoff, tied to the existing task-eval/autonomy harness rather than a new service
Bradley-Terry parallel candidate selector spike

arXiv

OpenDeepThink - Evaluate pairwise ranking of parallel agent outputs for objectively testable tasks before adopting as a verifier pattern

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

arXiv

- Directly supports the current rg-first search posture and warns that harness/tool-output presentation changes retrieval quality
FutureSim: Replaying World Events to Evaluate Adaptive Agents

arXiv

- Useful benchmark pattern for testing whether autonomous runtime changes improve adaptation over time instead of only static task completion
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

arXiv

- Candidate verifier pattern for ranking parallel agent outputs with pairwise comparisons instead of a single noisy pointwise judge
Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

arXiv

- Domain-specific, but its restricted workspace, domain SDK, tests, and structured feedback loop map cleanly to harness design

Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.

Enable plugin hooks blindly — - rejected because plugin hooks execute bundled lifecycle commands; requires a trust review gate and explicit enablement
Add new community hook scripts — - rejected by the skill hard limit: no new hook script can be wired unless the script already exists and the target file is explicitly identified
Wholesale import from Claude/CodeAlive/awesome skill catalogs — - rejected because Codex-owned runtime surfaces must not depend on Claude-owned layouts or unaudited outside skills
Native Codex memories flip-on — native memories remains a pilot decision, not a default switch
UserPromptSubmit prompt telemetry hooks — - rejected because global prompt telemetry is opt-in and the current policy forbids logging user prompts by default
SessionEnd workaround hook — - rejected because the upstream issue was closed as not planned and local heuristics would be brittle
Default worktree isolation for all subagents — - rejected because current agent fan-out is deliberately bounded; automatic worktree orchestration would add complexity without a proven recurring failure