Ecosystem Update - 2026-05-20
Highlights
- Official Codex
0.132.0shipped today and was safely applied; local CLI now reportscodex-cli 0.132.0 - The strongest local hardening win was profile-scoped: conservative profiles now disable login-shell semantics without touching the power-user default
- Today's research signal is that harness quality matters as much as model quality: clean code reduces agent operating cost, and enterprise SaaS failures cluster around setup/integration before business logic
Quick Wins (implemented today)
-
Stable Codex
0.132.0upgrade Codex-mdUpgrade from0.131.0to the stable0.132.0release and smoke-test the harness -
Conservative profile login-shell hardening Codex-mdAdd
allow_login_shell = falseonly to opt-in conservative profiles
New Tools, Skills & Patterns
-
codex exec resume --output-schemaadapter Codex-mdCodex 0.132.0 release - Existing local searches foundoutput_schemause inautoconfig, but no resumed automation path. Add schema-preserving resume support where long-runningcodex execjobs need structured closure after resumption -
Permission prompt miner for conservative profiles Codex-md
-
Agent-view style session control plane intake agent-patternHow Boris Uses Claude Code - Codex has subagents, goals, and session state, but no first-class local "all sessions by status" view. Evaluate whether existing
codex doctor --json, state DBs, and goal state are enough before adding a new dashboard -
Notify skill legacy path cleanup skillupdate the skill body in an explicit skill-maintenance pass, not as a Quick Win
-
SaaS setup/integration failure evalSaaSBench - Add a small harness eval that fails agents for premature closure during environment setup, dependency wiring, or service integration, not just business-logic tests
-
Agent monitor evasion regressionSLEIGHT-Bench - Current reviewer barriers are useful, but monitor-evasion examples suggest adding targeted prompts/tests for state manipulation, ambiguous user intent, and covert deployment/exfiltration patterns
Research Worth Reading
-
Does Code Cleanliness Affect Coding Agents?- New May 19 paper: cleaner code did not change pass rate in the reported minimal-pair setup, but reduced tokens and file revisits, making refactor discipline a direct harness-efficiency lever
-
SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors- Relevant to reviewer/validator barriers because it tests whether monitoring agents catch covert harmful objectives across full transcripts
-
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering- Strong fit for
planning-gateand/auto: the paper reports that most failures happen before deep business logic, reinforcing setup and integration checks -
1GC-7RC: One Graphic Card - Seven Research Challenges!- Useful shape for bounded GPU/task-budget evals; relevant to local Forge/autoconfig work but not a Quick Win
Considered, Not Adopting
Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.
-
Enable
service_tier = "priority"globally — - rejected because it changes cost/performance posture globally; use explicit/fastor per-session selection instead - Enable native Codex memories automatically
-
Enable
plugin_hooksautomatically — - rejected because plugin-bundled hooks are executable code and still need trust review before global enablement -
Wholesale install Boris/Thariq Claude skills — - rejected because they target
~/.claudeand Claude-specific command surfaces; adapt useful patterns into Codex-owned skills after audit - Clone Agent View as a new daemon now — - rejected as overengineering until existing Codex session/goal state proves insufficient for a read-only status report
-
Edit
AGENTS.mdas a Quick Win — - rejected by the ecosystem-update hard limit; constitutional policy changes require explicit direction