Ecosystem Update — 2026-04-13
TL;DR
- Semi-formal reasoning paper shows structured prompting (premises → trace → conclude) improves code reasoning 10% without execution — directly applicable to our reviewer agents
- Toolkit explosion: awesome-claude-code-toolkit now at 135+ agents, 400k+ skills via SkillKit — but signal-to-noise is dropping; most items are domain plugins or duplicate orchestrators
- New hook patterns worth stealing:
cc-safe-setupbundles 6 safety hooks in one-command install;claude-code-hooksrepo has 15 battle-tested hooks from 160+ hours autonomous operation;obeyhas 17 lifecycle hooks for rule enforcement - Tier 3 fetched this cycle (8 days since last); Tier 2 fetched (4 days since last)
Quick Wins
| Item | Source | Type | Impact | Effort | Action |
|---|---|---|---|---|---|
| None this cycle | — | — | — | — | Setup remains mature; remaining gaps require new scripts or skills |
No Quick Wins this cycle. Frontmatter additions and one-liner config changes were picked up in earlier runs. Everything new requires new scripts, new skill directories, or external tool installs — those are Build Queue items.
Build Queue
-
Semi-formal Reasoning Prompt Pattern (claude-md) — arxiv 2603.01896 — Structured prompting: construct premises, trace execution paths, derive formal conclusions. 10% accuracy improvement on code reasoning tasks. Could be added as reviewer agent instruction pattern. Impact 3, Effort 1, Priority 3.0 — but this is a prompt change to reviewer agent bodies (not frontmatter-only). Estimated: update reviewer.md and python-reviewer.md, ~20 LOC.
-
cc-safe-setup (hook) — yurukusa/cc-safe-setup — One-command install of 6 essential safety hooks (destructive command blocking, secret detection, large file prevention). Our hooks cover Stop/PreCompact/UserPromptSubmit but lack deterministic PreToolUse safety guards. Impact 2, Effort 2, Priority 1.0.
-
claude-code-hooks (battle-tested set) (hook) — yurukusa/claude-code-hooks — 15 production-tested hooks from 160+ hours autonomous operation. Includes PostToolUse linting, notification hooks, status hooks. Cherry-pick 2–3 that fill lifecycle gaps. Impact 2, Effort 2, Priority 1.0.
-
skills-janitor (skill) — khendzel/skills-janitor — Audits and deduplicates skills with 9 slash commands. We have 34 skill directories — some may be stale or overlapping. Impact 2, Effort 1, Priority 2.0. Borderline: we can do this with
rgand manual review. -
review-squad (agent-pattern) — 2389-research/review-squad — Multi-perspective code review via subagent panels. Our reviewers run solo, not as a panel. Panel pattern could improve coverage. Impact 2, Effort 2, Priority 1.0.
-
test-kitchen (agent-pattern) — 2389-research/test-kitchen — Parallel competing subagents with structured winner selection. Could enhance planning-gate by generating competing approaches before committing. Impact 2, Effort 2, Priority 1.0.
-
preflight prompt validator (mcp) — preflight-dev/preflight — 24-tool MCP server catching vague prompts before wasted cycles. Our UserPromptSubmit classifies route but doesn't validate prompt quality. Impact 2, Effort 2, Priority 1.0.
-
Auto-Dream Memory Consolidation (agent-pattern) — howborisusesclaudecode.com — Carried forward from last cycle. Subagent periodically reviews past sessions, merges insights. Our memory workflow is manual. Impact 2, Effort 2, Priority 1.0.
-
reporecall (mcp) — proofofwork-agency/reporecall — Tree-sitter AST indexing (22 languages) with ~5ms context injection. Could supplement omni-mem for code-level retrieval. Impact 2, Effort 3, Priority 0.7.
Research
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review — Taxonomy of 60 benchmarks, surveys agent frameworks, examines collaboration protocols (ACP, MCP, A2A). Reference material for governed agent architecture.
-
Memory for Autonomous LLM Agents — Five memory mechanism families including reflective self-improvement and policy-learned management. Key gap for us: we lack "learned forgetting" — our memory grows but never prunes automatically.
-
Agentic Code Reasoning (Semi-formal) — Semi-formal reasoning improves patch equivalence (78→88%), code QA (87% on RubberDuckBench), fault localization. Basis for the Build Queue item above.
-
Agent Contracts: Resource-Bounded Autonomous AI — Formal framework for resource and temporal constraints. Demonstrates 90% token reduction in iterative workflows. Our dispatch budgets are a crude version — this paper's formal approach could refine our budget model.
-
Deep Researcher Agent: 24/7 Experimentation — Framework for autonomous around-the-clock experiments. Relevant to our Ralph pattern and autonomous execution loop.
Already Have
isolation: worktree, context: fork, PostCompact hook, PreCompact hook, Stop hook, UserPromptSubmit hook, once: true modifier, PermissionRequest routing (via classify_prompt), type: prompt hooks, matcher/statusMessage fields, per-agent model overrides, allowed-tools restrictions, auto mode, batch command, agent teams awareness, session teleportation awareness, /btw side queries awareness, remote control awareness, memory workflow (native + omni-mem), planning-gate, Ralph loop, skill-creator, skill-installer, explorer read-only, isolation on all reviewer/planner/worker agents, cc-devops-skills awareness, fullstack-dev-skills awareness, Trail of Bits security skills awareness, context engineering kit awareness, compound engineering plugin awareness, container use awareness, ccmanager awareness, bouncer quality gate awareness, codetape awareness, harness meta-skill awareness, preflight MCP awareness, git worktree infrastructure, /govern orchestration, auto_runtime.py event-sourced tracking, dispatch budgets, postflight acceptance checking, what-would-chad-do reflection, route canary, enterprise maturity rubric
Rejected
- oh-my-claudecode (19 agents, 28 skills) — Fails overengineering gate: our curated 10-agent roster + /govern already covers this
- production-grade (14-agent autonomous workflow) — Same class: role-based decomposition vs our skill-based decomposition
- ORCH (CLI orchestrating Code, Codex, Cursor) — Multi-tool orchestrator. We're single-tool. Scope creep.
- vibe-kanban (Kanban-based agent coordination) — /govern handles coordination. Kanban UI is a new surface.
- cozempic (13 pruning strategies) — PreCompact + Stop hooks cover this. Overengineered.
- knowledge-graph (git-native context persistence) — omni-mem with fact graph covers this. Another persistence layer fails one-sentence proof.
- claude-supermemory (cross-session via Supermemory platform) — External platform dependency. omni-mem is local.
- fractal (recursive decomposition) — planning-gate + solution ladder covers decomposition
- harness-evolver (LangSmith-native prompt evolution) — External service dependency
- brooks-lint (code reviews from 6 classic books) — Too opinionated. Would conflict with our standards.
- jarvis (76 tasks, 12 AI teams) — Not our domain
- discoclaw (Discord bot) — We use Zoom
Sources checked: awesome-claude-code, howborisusesclaudecode.com, claude-code-best-practice, awesome-claude-code-toolkit, claude-code-new-features-early-2026, claude-code-hooks-mastery, WebSearch: github.com 2026 hooks/agents/skills, WebSearch: arxiv LLM agent coding 2026 Tier 2 fetched: yes (arxiv — last was 2026-04-09, 4 days ago) Tier 3 fetched: yes (awesome-claude-code-toolkit — last was 2026-04-05, 8 days ago) Run at: 2026-04-13T15:00:00Z Mode: --dry-run (no implementations)