Ecosystem Update - 2026-05-27
Highlights
- One safe Quick Win was implemented: restored the official Codex config schema header in
~/.codex/config.toml, clearing the local config-posture hard violation - Tier 1 community sources had no new GitHub commits since yesterday's run; their useful patterns are already covered locally or require repo-specific design
- Today's strongest research signals are SEC-bench Pro and Verus-SpecGym: both reinforce evidence-backed closure and intent/spec validation rather than new global automation
Quick Wins (implemented today)
-
Config schema header repair Codex-mdRestore
#:schema https://developers.openai.com/codex/config-schema.jsonas the first line of~/.codex/config.toml
New Tools, Skills & Patterns
-
SEC-bench Pro security-eval adapterhttps://arxiv.org/abs/2605.26548v1 - Long-horizon software-security tasks map cleanly to
codex-security,security-audit, and AgentOps closure gates. Build only a thin intake/eval adapter if the benchmark artifacts are reproducible enough to run locally -
Verus-SpecGym intent/spec validation intakehttps://arxiv.org/abs/2605.26457v1 - The paper targets a real verification failure mode: generated formal specs can be machine-checkable while not matching user intent. Fold this into planning-gate/eval design for formal or contract-heavy tasks before adding any new runtime hook
-
RepoMirage-style repository perturbation checkshttps://arxiv.org/abs/2605.26177v1 - Useful for future
rlm-scanand reviewer evals because it tests whether agents use repository structure robustly instead of overfitting path/name cues -
Conservative auto-review profile naming cleanup Codex-mdhttps://developers.openai.com/codex/config-reference#configtoml - Current
approvals_reviewer = "guardian_subagent"is schema-accepted as a legacy alias, but docs preferauto_review; normalize later with a focused profile compatibility check
Research Worth Reading
-
Helicase: Uncertainty-Guided Supply Chain Knowledge Graph Construction with Autonomous Multi-Agent LLMs- Relevant to source synthesis and confidence accounting, but a knowledge-graph layer would be too heavy for today's harness without a concrete repeated failure
-
SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?- Good candidate benchmark for security-agent closure quality and long-horizon exploit/patch workflows
-
Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization- Directly relevant to checking whether formal acceptance criteria preserve user intent, not just verifier success
-
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations- Supports future evals for cached repo context and codebase-localization robustness
Considered, Not Adopting
Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.
- Wholesale import from Awesome Claude Code, Boris workflows, or community Codex skill catalogs — - rejected: outside skills require strict audit and the local skill library already covers the recurring workflows
- Global auto-format PostToolUse hook — - rejected as a Quick Win: the skill forbids adding hooks that require new scripts, and formatting is repo-specific
- Enable native Codex memories
- Default worktree isolation for all agents — - rejected: useful for selected large migrations, but a global behavior change would add coordination overhead without a proven current bottleneck
- Build a Helicase-style source knowledge graph service — - rejected: current WebFetch/search plus report/state files satisfy today's source synthesis needs
-
Normalize
guardian_subagenttoauto_reviewimmediately — - rejected as a Quick Win: the official schema still accepts the legacy value for compatibility, so this is cleanup rather than a hard defect