Ecosystem Update - 2026-05-10
TL;DR
- One safe Quick Win was implemented: the read-only OpenAI developer docs MCP server now opts into parallel tool calls.
- Today's strongest new research signals are about constraint drift, skill least privilege, and judge-policy invariance; all map to the harness as evaluator or audit work, not immediate runtime rewrites.
- The current setup already covers the high-value community patterns: subagents, read-only reviewers, hooks, omni-mem lifecycle hooks, skill audit, plugins, browser/Gmail/docs tooling, and autonomous runtime wrappers.
Quick Wins
| Item | Source | Type | Impact | Effort | Action |
|---|---|---|---|---|---|
| OpenAI developer docs MCP parallel calls | https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-mcp.md and https://developers.openai.com/codex/config-schema.json | mcp | 2 | 1 | Auto-implemented supports_parallel_tool_calls = true for [mcp_servers.openaiDeveloperDocs] |
Auto-Implemented
- Updated
/Users/chadsimon/.codex/config.tomlso the official docs MCP server can run independent read-only tool calls in parallel. - Backed up
config.toml,hooks.json, and all agent TOMLs to/Users/chadsimon/.codex/backups/2026-05-10/before editing. - Verified
config.toml,hooks.json, and all/Users/chadsimon/.codex/agents/*.tomlparse after the change.
Build Queue
- PreToolUse/PostToolUse Bash policy gates (hook) - https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-hooks.md - Current hooks cover
SessionStart,Stop, andPreCompact, but no Bash tool guard exists. Worth building only after defining a small existing-script-backed allow/deny policy; the skill hard limit blocks adding hook wiring before the script exists. - Agent-scoped MCP allowlists (mcp) - https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-mcp.md - Current config exposes many MCP servers globally. A future pass should inventory actual tool use and narrow high-risk servers to specific agents instead of blanket access.
- Constraint decay regression check (research) - https://arxiv.org/abs/2605.06445 - Add a small eval that catches architecture, database, and interface constraints drifting during backend/codegen tasks.
- SkillScope-style least-privilege audit (skill) - https://arxiv.org/abs/2605.05868 - Extend the existing
codex-skill-audit --strictworkflow with declared tool/file/network expectations for imported skills before trust. - Judge policy invariance check (research) - https://arxiv.org/abs/2605.06161 - Add a validator/evaluator variant that perturbs review policy wording and flags unstable verdicts on the same artifact.
- Maintenance score for agent-generated code (research) - https://arxiv.org/abs/2605.06464 - Fold maintainability signals into
evaluateorrefactorrather than adding a new service.
Research
- MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems - Relevant to route-manifest and agent prompt tuning, but should stay research until a bounded benchmark proves lift.
- Constraint Decay: The Fragility of LLM Agents in Backend Code Generation - Directly relevant to planning-gate and acceptance checks for architecture-heavy work.
- SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills - Directly relevant to outside-skill trust decisions and the existing strict skill audit.
- Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges - Useful for reviewer/evaluator stability checks before trusting automated verdicts.
- To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study - Supports adding maintainability gates to post-build evaluation instead of only correctness gates.
- VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? - Interesting multi-agent loop study, but not directly actionable for the local harness today.
Already Have
Codex-owned AGENTS.md contract, model = "gpt-5.5", approval_policy = "never", sandbox_mode = "danger-full-access", prompt telemetry off, web_search = "live", codex_hooks = true, goals = true, plugins enabled, OpenAI developer docs MCP, omni-mem MCP and lifecycle hooks, SessionStart cached repo context hook, Stop memory save hook, PreCompact memory hook, read-only explorer/planner/reviewer/validator agents, Python and TypeScript reviewers, workspace-write worker and chad-twin agents, agent concurrency caps, official/bundled plugin marketplaces, Browser/Gmail/Documents/Presentations/Spreadsheets plugins, skill-audit, session-recall, auto, drive, govern, planning-gate, rlm-scan, memory-adaptation, and prior ecosystem state dedupe.
Rejected
- Wholesale import from awesome-claude-code - rejected: the repo is currently in an index rebuild state and the Codex contract requires copying or rewriting useful Claude behavior into Codex-owned surfaces, not runtime dependence.
- New daemon or orchestration layer for hooks - rejected: overengineered. Current
hooks.json, existing scripts, and Codex hook primitives are sufficient until a concrete recurring failure proves otherwise. - Enable native Codex memories immediately - rejected: already evaluated previously and conflicts with the current omni-mem default memory workflow unless there is an explicit pilot plan.
- Add broad GitHub/Slack/Notion/MCP connectors from community skill directories - rejected: no proof that current CLI, web, and installed plugin surfaces are insufficient for today's workflow.
- Policy-doc edits as Quick Wins - rejected by skill hard limit. Any
AGENTS.mdcontract changes must be explicit user-directed work.
Sources checked: https://github.com/hesreallyhim/awesome-claude-code, https://howborisusesclaudecode.com/, https://github.com/shanraisshan/codex-cli-best-practice, https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-hooks.md, https://github.com/shanraisshan/codex-cli-best-practice/blob/main/best-practice/codex-mcp.md, https://arxiv.org/search/?searchtype=all&query=LLM+agent+coding&order=-announced_date_first, https://developers.openai.com/codex/config-schema.json, web search: "Codex new hooks agents skills site:github.com 2026" Tier 2 fetched: yes Tier 3 fetched: no - skipped because the last Tier 3 run was 2026-05-08T15:37:21Z, inside the 7-day window omni-mem: available; run summary saved Run at: 2026-05-10T10:33:44Z