Ecosystem Update - 2026-05-20
TL;DR
- Official Codex
0.132.0shipped today and was safely applied; local CLI now reportscodex-cli 0.132.0. - The strongest local hardening win was profile-scoped: conservative profiles now disable login-shell semantics without touching the power-user default.
- Today's research signal is that harness quality matters as much as model quality: clean code reduces agent operating cost, and enterprise SaaS failures cluster around setup/integration before business logic.
Quick Wins
| Item | Source | Type | Impact | Effort | Action |
|---|---|---|---|---|---|
Stable Codex 0.132.0 upgrade |
openai/codex release, npm @openai/codex |
Codex-md | 3 | 1 | Upgrade from 0.131.0 to the stable 0.132.0 release and smoke-test the harness. |
| Conservative profile login-shell hardening | Codex config reference, codex-cli-best-practice | Codex-md | 2 | 1 | Add allow_login_shell = false only to opt-in conservative profiles. |
Auto-Implemented
- Backed up
config.toml,hooks.json, and all current agent TOMLs under/Users/chadsimon/.codex/backups/2026-05-20/. - Ran
codex update, which executednpm install -g @openai/codex;codex --versionnow reportscodex-cli 0.132.0. - Updated
/Users/chadsimon/.codex/config.tomlso[profiles.conservative]and[profiles.conservative-auto-review]setallow_login_shell = false. - Verified
config.tomlparses withtomllibandhooks.jsonparses withpython3 -m json.tool. - Verified base,
conservative, andconservative-auto-reviewprofiles withTERM=xterm-256color codex --strict-config doctor --summary --ascii; all returned13 ok | 1 idle | 2 notes | 0 warn | 0 fail. - Verified posture and local harness tests:
python3 ~/.codex/bin/codex_config_posture.py --mode warnpassed, andPYTHONPATH=/Users/chadsimon/.codex/bin python3 -m unittest test_codex_config_posture test_codex_agentops_contract test_codex_task_managerpassed 31 tests. python3 ~/.codex/bin/auto_runtime.py contract-checkstill reports one pre-existing hard issue:MEMORY_CITATIONS_REQUIREDformemory.records.objective:init. This is unrelated to today's config edit and CLI update, but should be cleared before relying on AgentOps closure reports.
Build Queue
codex exec resume --output-schemaadapter (Codex-md) - Codex 0.132.0 release - Existing local searches foundoutput_schemause inautoconfig, but no resumed automation path. Add schema-preserving resume support where long-runningcodex execjobs need structured closure after resumption.- Permission prompt miner for conservative profiles (Codex-md) - How Boris Uses Claude Code, Codex config reference - Boris's
/fewer-permission-promptspattern maps to Codexon-requestprofiles, but should be implemented as a report-only analyzer over local approvals/rules before changing allowlists. - Agent-view style session control plane intake (agent-pattern) - How Boris Uses Claude Code - Codex has subagents, goals, and session state, but no first-class local "all sessions by status" view. Evaluate whether existing
codex doctor --json, state DBs, and goal state are enough before adding a new dashboard. - Notify skill legacy path cleanup (skill) - How Boris Uses Claude Code notifications - Current
notifyskill docs still reference legacy~/.Codex/bin/notify_done.sh; update the skill body in an explicit skill-maintenance pass, not as a Quick Win. - SaaS setup/integration failure eval (research) - SaaSBench - Add a small harness eval that fails agents for premature closure during environment setup, dependency wiring, or service integration, not just business-logic tests.
- Agent monitor evasion regression (research) - SLEIGHT-Bench - Current reviewer barriers are useful, but monitor-evasion examples suggest adding targeted prompts/tests for state manipulation, ambiguous user intent, and covert deployment/exfiltration patterns.
Research
- Does Code Cleanliness Affect Coding Agents? - New May 19 paper: cleaner code did not change pass rate in the reported minimal-pair setup, but reduced tokens and file revisits, making refactor discipline a direct harness-efficiency lever.
- SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors - Relevant to reviewer/validator barriers because it tests whether monitoring agents catch covert harmful objectives across full transcripts.
- SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering - Strong fit for
planning-gateand/auto: the paper reports that most failures happen before deep business logic, reinforcing setup and integration checks. - 1GC-7RC: One Graphic Card - Seven Research Challenges! - Useful shape for bounded GPU/task-budget evals; relevant to local Forge/autoconfig work but not a Quick Win.
Already Have
gpt-5.5 power-user default, approval_policy = "never", sandbox_mode = "danger-full-access", prompt telemetry off, live web search, schema-linked config.toml, features.hooks = true, features.plugins = true, features.goals = true, features.prevent_idle_sleep = true, features.plugin_hooks = false, destructive app tools disabled by default, OpenAI developer docs MCP, omni-mem MCP, Stitch and Kickstarter MCP entries, Browser/Chrome/Computer Use/Documents/Spreadsheets/Presentations/Gmail/OpenAI Developers plugins, Bash PreToolUse guard, Bash PostToolUse verification ledger and failure-context hooks, SessionStart repo-context and config-posture hooks, Stop and PreCompact omni-mem hooks, read-only explorer/planner/reviewer/python-reviewer/typescript-reviewer/validator agents, scoped worker and chad-twin agents, bounded agent thread/depth/runtime caps, profiles.review, profiles.conservative, profiles.conservative-auto-review, conservative profile login-shell suppression, global placeholder pokegen disable, session-recall, rlm-scan, planning-gate, auto, drive, go, codex-security, security-audit, codex-runtime-doctor, what-would-chad-do, and stable codex-cli 0.132.0.
Rejected
- Enable
service_tier = "priority"globally - rejected because it changes cost/performance posture globally; use explicit/fastor per-session selection instead. - Enable native Codex memories automatically - rejected again because local policy currently uses omni-mem as the default memory system and
features.memories = falseis intentional. - Enable
plugin_hooksautomatically - rejected because plugin-bundled hooks are executable code and still need trust review before global enablement. - Wholesale install Boris/Thariq Claude skills - rejected because they target
~/.claudeand Claude-specific command surfaces; adapt useful patterns into Codex-owned skills after audit. - Clone Agent View as a new daemon now - rejected as overengineering until existing Codex session/goal state proves insufficient for a read-only status report.
- Edit
AGENTS.mdas a Quick Win - rejected by the ecosystem-update hard limit; constitutional policy changes require explicit direction.
Sources checked: https://github.com/hesreallyhim/awesome-claude-code, https://howborisusesclaudecode.com/, https://github.com/shanraisshan/codex-cli-best-practice, https://developers.openai.com/codex/config-reference, https://github.com/openai/codex/releases, https://github.com/openai/codex/releases/tag/rust-v0.132.0, https://www.npmjs.com/package/@openai/codex, https://arxiv.org/search/?searchtype=all&query=LLM+agent+coding&order=-announced_date_first, https://arxiv.org/abs/2605.20049, https://arxiv.org/abs/2605.16626, https://arxiv.org/abs/2605.17526, https://arxiv.org/abs/2605.17046
Tier 2 fetched: yes.
Tier 3 fetched: yes; explicit daily crawl requested.
omni-mem write: saved memory a155327a-17d7-4b92-b7e4-87289b50991b.
Run at: 2026-05-20T10:33:24Z