Ecosystem Update - 2026-05-10

May 10, 2026 · curated by Chad Simon · 18 items reviewed

Highlights

One safe Quick Win was implemented: the read-only OpenAI developer docs MCP server now opts into parallel tool calls
Today's strongest new research signals are about constraint drift, skill least privilege, and judge-policy invariance; all map to the harness as evaluator or audit work, not immediate runtime rewrites

Quick Wins (implemented today)

OpenAI developer docs MCP parallel calls mcp

github.com/shanraisshan/codex-cli-best-practice

Auto-implemented supports_parallel_tool_calls = true for [mcp_servers.openaiDeveloperDocs]

New Tools, Skills & Patterns

PreToolUse/PostToolUse Bash policy gates hook

github.com/shanraisshan/codex-cli-best-practice

https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-hooks.md - Current hooks cover SessionStart, Stop, and PreCompact, but no Bash tool guard exists. Worth building only after defining a small existing-script-backed allow/deny policy; the skill hard limit blocks adding hook wiring before the script exists
Agent-scoped MCP allowlists mcp

github.com/shanraisshan/codex-cli-best-practice

https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-mcp.md - Current config exposes many MCP servers globally. A future pass should inventory actual tool use and narrow high-risk servers to specific agents instead of blanket access
Constraint decay regression check

arXiv

https://arxiv.org/abs/2605.06445 - Add a small eval that catches architecture, database, and interface constraints drifting during backend/codegen tasks
SkillScope-style least-privilege audit skill

arXiv

https://arxiv.org/abs/2605.05868 - Extend the existing codex-skill-audit --strict workflow with declared tool/file/network expectations for imported skills before trust
Judge policy invariance check

arXiv

https://arxiv.org/abs/2605.06161 - Add a validator/evaluator variant that perturbs review policy wording and flags unstable verdicts on the same artifact
Maintenance score for agent-generated code

arXiv

https://arxiv.org/abs/2605.06464 - Fold maintainability signals into evaluate or refactor rather than adding a new service

Research Worth Reading

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

arXiv

- Relevant to route-manifest and agent prompt tuning, but should stay research until a bounded benchmark proves lift
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

arXiv

- Directly relevant to planning-gate and acceptance checks for architecture-heavy work
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

arXiv

- Directly relevant to outside-skill trust decisions and the existing strict skill audit
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges

arXiv

- Useful for reviewer/evaluator stability checks before trusting automated verdicts
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study

arXiv

- Supports adding maintainability gates to post-build evaluation instead of only correctness gates
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

arXiv

- Interesting multi-agent loop study, but not directly actionable for the local harness today

Considered, Not Adopting

Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.

Wholesale import from awesome-claude-code — - rejected: the repo is currently in an index rebuild state and the Codex contract requires copying or rewriting useful Claude behavior into Codex-owned surfaces, not runtime dependence
New daemon or orchestration layer for hooks — - rejected: overengineered. Current hooks.json, existing scripts, and Codex hook primitives are sufficient until a concrete recurring failure proves otherwise
Enable native Codex memories immediately
Add broad GitHub/Slack/Notion/MCP connectors from community skill directories — - rejected: no proof that current CLI, web, and installed plugin surfaces are insufficient for today's workflow
Policy-doc edits as Quick Wins — - rejected by skill hard limit. Any AGENTS.md contract changes must be explicit user-directed work

Ecosystem Update - 2026-05-10

Highlights

Quick Wins (implemented today)

New Tools, Skills & Patterns

Research Worth Reading

Considered, Not Adopting

Sources Reviewed