Ecosystem Update — 2026-04-13
Highlights
- Semi-formal reasoning paper shows structured prompting (premises → trace → conclude) improves code reasoning 10% without execution — directly applicable to
- Toolkit explosion: awesome-claude-code-toolkit now at 135+ agents, 400k+ skills via SkillKit — but signal-to-noise is dropping; most items are domain plugins or duplicate orchestrators
- New hook patterns worth stealing:
cc-safe-setupbundles 6 safety hooks in one-command install;claude-code-hooksrepo has 15 battle-tested hooks from 160+ hours autonomous operation;obeyhas 17 lifecycle hooks for rule enforcement - Tier 3 fetched this cycle (8 days since last); Tier 2 fetched (4 days since last)
Quick Wins (implemented today)
-
_None this cycle_ —Setup remains mature; remaining gaps require new scripts or skills
New Tools, Skills & Patterns
-
Semi-formal Reasoning Prompt Pattern claude-mdStructured prompting: construct premises, trace execution paths, derive formal conclusions. 10% accuracy improvement on code reasoning tasks. Could be added as reviewer agent instruction pattern. Impact 3, Effort 1, Priority 3.0 — but this is a prompt change to reviewer agent bodies (not frontmatter-only). Estimated: update reviewer.md and python-reviewer.md, ~20 LOC
-
cc-safe-setup hookOne-command install of 6 essential safety hooks (destructive command blocking, secret detection, large file prevention). Our hooks cover Stop/PreCompact/UserPromptSubmit but lack deterministic PreToolUse safety guards. Impact 2, Effort 2, Priority 1.0
-
claude-code-hooks (battle-tested set) hook15 production-tested hooks from 160+ hours autonomous operation. Includes PostToolUse linting, notification hooks, status hooks. Cherry-pick 2–3 that fill lifecycle gaps. Impact 2, Effort 2, Priority 1.0
-
skills-janitor skillAudits and deduplicates skills with 9 slash commands. We have 34 skill directories — some may be stale or overlapping. Impact 2, Effort 1, Priority 2.0. Borderline: we can do this with
rgand manual review -
review-squad agent-patternMulti-perspective code review via subagent panels. Our reviewers run solo, not as a panel. Panel pattern could improve coverage. Impact 2, Effort 2, Priority 1.0
-
test-kitchen agent-patternParallel competing subagents with structured winner selection. Could enhance planning-gate by generating competing approaches before committing. Impact 2, Effort 2, Priority 1.0
-
preflight prompt validator mcp24-tool MCP server catching vague prompts before wasted cycles. Our UserPromptSubmit classifies route but doesn't validate prompt quality. Impact 2, Effort 2, Priority 1.0
-
Auto-Dream Memory Consolidation agent-patternSubagent periodically reviews past sessions, merges insights. Our memory workflow is manual. Impact 2, Effort 2, Priority 1.0
-
reporecall mcpTree-sitter AST indexing (22 languages) with ~5ms context injection. Impact 2, Effort 3, Priority 0.7
Research Worth Reading
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive ReviewTaxonomy of 60 benchmarks, surveys agent frameworks, examines collaboration protocols (ACP, MCP, A2A). Reference material for governed agent architecture
-
Memory for Autonomous LLM AgentsFive memory mechanism families including reflective self-improvement and policy-learned management. Key gap for us: we lack "learned forgetting" — memory grows but never prunes automatically
-
Agentic Code Reasoning (Semi-formal)Semi-formal reasoning improves patch equivalence (78→88%), code QA (87% on RubberDuckBench), fault localization. Basis for the Build Queue item above
-
Agent Contracts: Resource-Bounded Autonomous AIFormal framework for resource and temporal constraints. Demonstrates 90% token reduction in iterative workflows. Our dispatch budgets are a crude version — this paper's formal approach could refine budget model
-
Deep Researcher Agent: 24/7 ExperimentationFramework for autonomous around-the-clock experiments. Relevant to Ralph pattern and autonomous execution loop
Considered, Not Adopting
Items reviewed and explicitly declined this cycle, with the reason. Curation discipline matters more than coverage.
- oh-my-claudecode
- production-grade — (14-agent autonomous workflow) — Same class: role-based decomposition vs skill-based decomposition
- ORCH — (CLI orchestrating Code, Codex, Cursor) — Multi-tool orchestrator. We're single-tool. Scope creep
- vibe-kanban — Kanban UI is a new surface
- cozempic — (13 pruning strategies) — PreCompact + Stop hooks cover this. Overengineered
- knowledge-graph — Another persistence layer fails one-sentence proof
- claude-supermemory — (cross-session via Supermemory platform) — External platform dependency
- fractal — (recursive decomposition) — planning-gate + solution ladder covers decomposition
- harness-evolver — (LangSmith-native prompt evolution) — External service dependency
- brooks-lint — (code reviews from 6 classic books) — Too opinionated. Would conflict with standards
- jarvis — (76 tasks, 12 AI teams) — Not domain
- discoclaw — (Discord bot) — We use Zoom