Ecosystem Update - 2026-05-10
Highlights
- One safe Quick Win was implemented: the read-only OpenAI developer docs MCP server now opts into parallel tool calls
- Today's strongest new research signals are about constraint drift, skill least privilege, and judge-policy invariance; all map to the harness as evaluator or audit work, not immediate runtime rewrites
Quick Wins (implemented today)
-
OpenAI developer docs MCP parallel calls mcpAuto-implemented
supports_parallel_tool_calls = truefor[mcp_servers.openaiDeveloperDocs]
New Tools, Skills & Patterns
-
PreToolUse/PostToolUse Bash policy gates hookhttps://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-hooks.md - Current hooks cover
SessionStart,Stop, andPreCompact, but no Bash tool guard exists. Worth building only after defining a small existing-script-backed allow/deny policy; the skill hard limit blocks adding hook wiring before the script exists -
Agent-scoped MCP allowlists mcphttps://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-mcp.md - Current config exposes many MCP servers globally. A future pass should inventory actual tool use and narrow high-risk servers to specific agents instead of blanket access
-
Constraint decay regression checkhttps://arxiv.org/abs/2605.06445 - Add a small eval that catches architecture, database, and interface constraints drifting during backend/codegen tasks
-
SkillScope-style least-privilege audit skillhttps://arxiv.org/abs/2605.05868 - Extend the existing
codex-skill-audit --strictworkflow with declared tool/file/network expectations for imported skills before trust -
Judge policy invariance checkhttps://arxiv.org/abs/2605.06161 - Add a validator/evaluator variant that perturbs review policy wording and flags unstable verdicts on the same artifact
-
Maintenance score for agent-generated codehttps://arxiv.org/abs/2605.06464 - Fold maintainability signals into
evaluateorrefactorrather than adding a new service
Research Worth Reading
-
MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems- Relevant to route-manifest and agent prompt tuning, but should stay research until a bounded benchmark proves lift
-
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation- Directly relevant to planning-gate and acceptance checks for architecture-heavy work
-
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills- Directly relevant to outside-skill trust decisions and the existing strict skill audit
-
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges- Useful for reviewer/evaluator stability checks before trusting automated verdicts
-
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study- Supports adding maintainability gates to post-build evaluation instead of only correctness gates
-
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?- Interesting multi-agent loop study, but not directly actionable for the local harness today
Considered, Not Adopting
Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.
- Wholesale import from awesome-claude-code — - rejected: the repo is currently in an index rebuild state and the Codex contract requires copying or rewriting useful Claude behavior into Codex-owned surfaces, not runtime dependence
-
New daemon or orchestration layer for hooks — - rejected: overengineered. Current
hooks.json, existing scripts, and Codex hook primitives are sufficient until a concrete recurring failure proves otherwise - Enable native Codex memories immediately
- Add broad GitHub/Slack/Notion/MCP connectors from community skill directories — - rejected: no proof that current CLI, web, and installed plugin surfaces are insufficient for today's workflow
-
Policy-doc edits as Quick Wins — - rejected by skill hard limit. Any
AGENTS.mdcontract changes must be explicit user-directed work