Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Documentation

Start here when browsing the repository documentation directly.

DocPurpose
InstallationInstall paths for Claude Code, Codex, OpenCode, and source builds
GuideCommand map, binary operations, artifacts, and guide links
ExamplesCopy-paste configs for common autoresearch goals
System ArchitectureBinary, agent package, runtime, and artifact architecture
Project ChangelogRelease history entrypoint and current development track
Detailed ChangelogVersioned release notes
Development RoadmapCurrent and planned runtime, search, MCP, workspace, and release work

The documentation set also builds as an mdBook site from book.toml and docs/SUMMARY.md.

Installation

Autoresearch ships as a Rust binary plus agent-specific skill or command packages.

Agent-Driven Install

The primary path is to give this prompt to the agent that should use Autoresearch:

Install Autoresearch in this environment.

Use the installer from:
https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh

Pick the install flag for the current agent:
- Claude Code: --claude
- Codex: --codex
- OpenCode: --opencode
- If you cannot infer the agent, use --all.

Run the installer non-interactively with bash, verify `autoresearch --help`, then tell me the command I should use to start Autoresearch in this agent.
Start commands are `/autoresearch` for Claude Code, `$autoresearch` for Codex, and `/autoresearch` for OpenCode.
Use a global install unless I explicitly asked for a project-local install.

Raw GitHub Installer

Use the raw installer when you want the source build plus agent package without cloning first. Pick the exact command for your agent:

Claude Code:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --claude

Codex:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex

OpenCode:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode

All packages:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --all

The raw script downloads a source archive, builds the Rust binary, and runs the same install.sh used by a local clone. Set AUTORESEARCH_INSTALL_REF to use a different branch, AUTORESEARCH_INSTALL_REPO to use a fork, or AUTORESEARCH_INSTALL_ARCHIVE_URL to provide an explicit archive URL.

Pre-Built Binaries

Tagged releases publish .tar.gz archives for Linux x86_64, Linux aarch64, macOS x86_64, macOS aarch64, and Windows x86_64. Download the archive for your platform from the GitHub release, verify the adjacent .sha256 file, and place the autoresearch binary on your PATH.

Cargo Binstall

Cargo.toml includes cargo-binstall metadata for the same target-named release archives:

cargo binstall autoresearch

Homebrew

Homebrew tap maintainers can render packaging/homebrew/autoresearch.rb.template with the release version and SHA-256 values from GitHub release assets, then publish it as Formula/autoresearch.rb in a tap.

Claude Code

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --claude

This builds autoresearch, installs it on your PATH, and installs the Claude Code plugin hooks.

If the binary is already installed:

claude plugin add coder-company/agent-autoresearch

Restart Claude Code after installing the plugin.

Manual local Claude package:

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch

# From the project where you want local commands/skills:
mkdir -p /path/to/project/.claude
cp -R .claude/commands /path/to/project/.claude/commands
cp -R .claude/skills/autoresearch /path/to/project/.claude/skills/autoresearch

The .claude/ package is generated from the same canonical command and reference files as the plugin package.

Codex

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex

This builds autoresearch, installs it on your PATH, and installs the Codex skill package.

Then start Codex from your project and invoke:

$autoresearch

For the smoothest foreground and background runs, start Codex with full workspace access:

codex --dangerously-bypass-approvals-and-sandbox

If you only want the Codex skill package and not the source-built binary:

$skill-installer install https://github.com/coder-company/agent-autoresearch

For a project-local Codex skill install, run the raw installer from the target project:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex --local

That installs to ./.codex/skills/autoresearch in the current project. Use --global for the default user-wide target, or --codex-dir for an explicit destination.

From a local clone:

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --codex

The installer copies .agents/skills/autoresearch/ and validates the target path before replacing the installed skill directory.

Project-local install from a local clone:

/path/to/agent-autoresearch/install.sh --yes --codex --local

To install the local Codex plugin package through the installer:

./install.sh --yes --codex-plugin

Local plugin package:

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
codex plugin marketplace add .agents/plugins/marketplace.json
codex plugin install autoresearch@autoresearch-local

The marketplace entry points at plugins/autoresearch/, which packages the same maintained Codex skill plus plugin metadata.

OpenCode

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode

This builds autoresearch, installs it on your PATH, and installs the OpenCode command and skill package.

Start OpenCode from your project and invoke:

/autoresearch

OpenCode mode commands use underscore names such as /autoresearch_debug, /autoresearch_fix, and /autoresearch_security. The package also installs the hidden docs-manager helper agent for focused documentation updates.

For a project-local OpenCode install, run the raw installer from the target project:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode --local

That installs to ./.opencode in the current project. Use --global for the default user-wide target, or --opencode-dir for an explicit OpenCode config root.

Project-local install from a local clone:

/path/to/agent-autoresearch/install.sh --yes --opencode --local

The installer refuses empty, home, and parent config paths before replacing skills/autoresearch.

VS Code

./install.sh --yes --vscode

That copies integrations/vscode into your VS Code extensions directory and keeps the installed extension delegated to the autoresearch binary on PATH. Use --vscode-dir for an explicit extensions directory.

From Source

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --all

Use ./install.sh without flags for the guided installer.

Verify The Install

autoresearch --help
autoresearch screen --command "npm test"
autoresearch completions zsh >/tmp/_autoresearch
autoresearch manpages --output-dir /tmp/autoresearch-manpages
autoresearch config template >/tmp/autoresearch.toml
autoresearch config validate --path /tmp/autoresearch.toml

For repository contributors:

./scripts/validate_distribution.sh
./scripts/run_skill_e2e.sh binary-smoke --clean
./scripts/run_skill_e2e.sh runtime-smoke --clean
./scripts/run_skill_e2e.sh parallel-smoke --clean
./scripts/run_contributor_gate.sh

See Getting Started and Codex usage for first-run examples.

Guide

Autoresearch is a loop controller for agents: define a measurable goal, modify one thing, verify mechanically, keep or discard, and repeat.

Core Commands

NeedUse
Improve a metric/autoresearch or $autoresearch
Pick a metric from a vague goal/autoresearch:plan or $autoresearch plan
Find a root cause/autoresearch:debug or $autoresearch debug
Reduce errors to zero/autoresearch:fix or $autoresearch fix
Run a security audit/autoresearch:security or $autoresearch security
Ship through gates/autoresearch:ship or $autoresearch ship
Analyze prior runs/autoresearch:evals or $autoresearch evals

Binary Operations

The agent-facing protocols delegate stateful work to the autoresearch binary:

autoresearch init --verify "cat metric.txt" --direction lower
autoresearch verify --command "cat metric.txt"
autoresearch verify --command "cat metric.txt" --repeat 3 --aggregate median
autoresearch plan --goal "reduce any types" --format json
autoresearch debug --symptom "API returns 500" --scope "src/**/*.rs"
autoresearch fix --target "npx tsc --noEmit" --scope "src/**/*.ts" --category type
autoresearch improve --goal "Improve onboarding activation" --icp "Developer tools teams"
autoresearch prd --title "Improve onboarding" --problem "New users stall before first run"
autoresearch security --scope "src/**/*.rs" --focus auth
autoresearch ship --target "Release v1.2.0" --type code-release --dry-run
autoresearch scenario --target "Checkout flow" --domain web --format test-scenarios --scope "src/checkout/**"
autoresearch predict --proposal "Add cache warming to search results" --scope "src/search/**"
autoresearch predict --proposal "Find product improvements for onboarding" --scope "src/**" --improve
autoresearch reason --question "Should we replace the storage layer" --mode debate --domain software
autoresearch probe --subject "Payment retry workflow" --scope "src/payments/**"
autoresearch probe --subject "Onboarding activation workflow" --scope "src/**" --improve
autoresearch learn --mode summarize --scope "src/**/*.rs"
autoresearch decide --decision auto --metric 4 --commit abc1234 --description "improved"
autoresearch status --summary
autoresearch progress
autoresearch cost --per-iteration-usd 0.25 --format json
autoresearch dashboard --once
autoresearch health --strict
autoresearch env --format json
autoresearch init --environment-summary auto --verify "cat metric.txt" --direction lower
autoresearch checkpoint --format json
autoresearch reanchor --format json
autoresearch watch --lines 20 --format jsonl
autoresearch watch --websocket --websocket-addr 127.0.0.1:8765
autoresearch lessons --add "Prefer fixture-level assertions" --context "reduced flaky tests"
autoresearch search --from-state --provider-command 'exa "$AUTORESEARCH_SEARCH_QUERY"' --log
autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel template --workers 3 --output autoresearch-results/parallel-workers.json
autoresearch parallel compare --a "simplify parser" --b "cache scan results"
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json --merge-strategy cherry-pick
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json
autoresearch evals --file autoresearch-results/results.tsv --format json --recommend --plateau-window 5 --target 90 --fail-on goal-not-met --chain ship
autoresearch evals --file autoresearch-results/results.tsv --compare autoresearch-results/previous-results.tsv --format json
autoresearch api --format json
autoresearch mcp serve
autoresearch mcp call --server-command "autoresearch mcp serve" --tool autoresearch_status
autoresearch scope expand --format json
autoresearch workspace exec --command "cargo test" --rollback-on-failure
autoresearch guard-presets --format json
autoresearch lessons --workspace-context --last 5
autoresearch plugin list
autoresearch plugin validate --path .autoresearch/plugins/example.toml
autoresearch plugin marketplace
autoresearch completions zsh > ~/.zfunc/_autoresearch

Use autoresearch runtime run for supervised background Codex sessions and autoresearch runtime status / autoresearch runtime stop for control. Use autoresearch env --format json to capture CPU, disk, container, toolchain, and recommended parallel-worker context before planning long or parallel runs; pass --environment-summary auto to init to persist that probe summary in results.tsv. Use autoresearch status --summary for compact monitor-friendly counters. Use autoresearch progress for the current metric, trend, counters, escalation state, and terminal metric history sparkline. Use autoresearch verify --repeat <n> --aggregate <median|mean|min|max|last> for noisy scalar metrics; repeated verification returns the aggregate metric plus the raw samples. Use autoresearch cost --per-iteration-usd <usd> or token/rate flags to estimate completed, remaining, and projected run spend. Use autoresearch dashboard --once for a combined terminal view of status, trend, metric history, escalation, and recent rows; omit --once for live refresh. Use autoresearch checkpoint --format json inside long loops to run evals only when the active iteration reaches the configured or adaptive checkpoint interval. Use autoresearch reanchor --format json every 10 iterations or after context compaction to print the protocol fingerprint, reload references, and [RE-ANCHOR] logging tag. Use autoresearch watch --format <tsv|jsonl> for human-readable tails or machine-readable JSON Lines. Use autoresearch watch --websocket --websocket-addr <host:port> to serve snapshot and row update payloads to real-time dashboards. Add --once to print the initial WebSocket snapshot envelope without starting a server. Use autoresearch lessons --add <strategy> --context <note> to append reusable lessons without editing lessons.md by hand. Use autoresearch search --from-state with --provider-command or AUTORESEARCH_SEARCH_CMD to run cached, run-aware web searches. Add --log to append a search meta-iteration. When decide escalates to Web Search, it automatically runs the same cached helper with AUTORESEARCH_SEARCH_CMD and logs the result when timing/cooldown limits allow it. Use autoresearch parallel closeout --merge-strategy <cherry-pick|fast-forward|squash|rebase> to select how the retained worker commit is merged. Use autoresearch parallel compare --a <hypothesis> --b <hypothesis> to prepare a two-arm A/B batch that reuses parallel run and verified parallel closeout. Use autoresearch evals --file <path> --format json --recommend --plateau-window 5 --target <metric-threshold> --fail-on goal-not-met --chain ship after parallel closeout to include worker improvement counts, a sign-test summary, anomaly detection, goal-achieved status, CI-friendly exit gating, next-step guidance, and downstream handoff metadata. Use autoresearch evals --file <path> --compare <other-results.tsv> --format json to compare run improvement, efficiency, and plateau length before choosing the next strategy. Use autoresearch completions <bash|zsh|fish|elvish|powershell> to generate shell completions. Use autoresearch manpages --output-dir man/man1 to generate a local autoresearch.1 manual page. Use autoresearch config template --output .autoresearch.toml to write a starter project defaults file. Use autoresearch config validate to parse defaults, validate options, and screen configured commands without running them. Use autoresearch plan --goal <goal> --format json to get a launch-ready suggested scope, metric, direction, verify, guard, and iteration count from detected repo tooling. Use autoresearch plan --goal <goal> --debug to write the derived config into a downstream debug handoff. Native artifact generators default to ignored autoresearch-results/<mode>/ paths; pass --output or --output-dir only when you intentionally want a different artifact location. Use autoresearch debug --symptom <failure> --scope <glob> to write a hypothesis-driven investigation bundle with summary, findings, eliminated hypotheses, TSV, and handoff JSON. Add --fix or --chain <targets> to autoresearch debug to record downstream chain metadata in the debug handoff. Use autoresearch debug --depth deep --iterations 12 --severity high to override the investigation budget and record severity filter metadata. Use autoresearch fix --target <verify-command> --scope <glob> --iterations 7 to write a repair-plan bundle under autoresearch-results/fix with priority order, results TSV, iteration budget, and handoff JSON. Use autoresearch fix --from-debug to import the latest debug handoff scope, symptom, and finding count into the repair plan. Use autoresearch fix --learn --evals to record downstream learn handoff and checkpoint propagation metadata. Use autoresearch improve --goal <product-area> --icp <persona> to write an improve-mode artifact bundle: research findings, ranked plan, summary, TSV, and handoff JSON. Use autoresearch improve --goal <product-area> --icp <persona> --depth deep --iterations 24 --evals to override the research budget and record active category count plus checkpoint metadata. Use autoresearch improve --goal <product-area> --seeds 5 --no-discover --learn to record seed volume, discovery posture, and downstream learn handoff metadata. Use autoresearch prd --title <title> --problem <problem> to write a focused improve-mode PRD with DECISION NEEDED markers, acceptance criteria, risks, success metrics, and an autoresearch config block. Use autoresearch security --scope <glob> --focus <area> to write a STRIDE + OWASP audit bundle with overview, threat model, attack surface, coverage, findings, recommendations, TSV, and handoff JSON. Add --fail-on <severity> and --fix to autoresearch security to record CI gate and repair-chain metadata for confirmed findings. Use autoresearch security --scope <glob> --depth deep --iterations 18 --diff --fix --evals to override the audit budget and record delta mode, downstream fix handoff, and checkpoint metadata. Use autoresearch ship --target <thing> --type <kind> --dry-run to write an 8-phase ship checklist, summary, ship log, and handoff JSON without external side effects. Use autoresearch ship --target <thing> --auto --force --rollback --monitor 15 --learn to record approval, rollback, monitoring, and downstream learn handoff metadata. Use autoresearch scenario --target <feature> --domain <general|web|mobile|api|cli|data-pipeline|infrastructure> --format <test-scenarios|threat-scenarios|use-cases|user-stories> to write a 12-dimension scenario matrix for tests, threat modeling, or debug follow-up. Use autoresearch scenario --target <feature> --domain web --depth deep --iterations 16 --evals --debug to override the exploration budget and record domain, checkpoint metadata, and downstream debug handoff. Use autoresearch predict --proposal <change> to write a five-persona review covering architecture, security, performance, UX, and adversarial risks. Use autoresearch predict --proposal <change> --depth deep --adversarial --fail-on high to record review profile and CI gate metadata. Use autoresearch predict --proposal <change> --debug to record the review as handoff context for downstream investigation. Use autoresearch predict --proposal <product-area> --improve to pass expert findings into product improvement research. Use autoresearch reason --question <decision> to write an adversarial debate artifact with candidate solutions, blind judge rubric, and convergence criteria. Use autoresearch reason --question <decision> --predict to pass the selected debate context into downstream review. Use autoresearch reason --question <decision> --iterations 11 --judges 7 --convergence 4 --temperature 0.2 to record debate budget, judge panel, convergence, synthesis, and generation hints. Use autoresearch probe --subject <requirement> to write eight persona-driven questions, constraint slots, and a saturation rule before implementation. Use autoresearch probe --subject <requirement> --mode autonomous --depth deep --iterations 9 --adversarial to override the interrogation round budget and record saturation metadata. Use autoresearch probe --subject <requirement> --plan to pass discovered constraints into planning through handoff metadata. Use autoresearch probe --subject <product-area> --improve to pass discovered constraints into product improvement research. Use autoresearch learn --mode <init|update|check|summarize> --scope <glob> to write documentation summary, validation, TSV, and handoff artifacts. Use autoresearch learn --mode check --file <path> --depth overview --iterations 14 --topics architecture,api --no-fix --evals to record learn profile, specific-file scope, validation behavior, chain, and checkpoint metadata. Use autoresearch api --format json to inspect the stable command/flag manifest and semver policy used by wrappers and agents. Use autoresearch mcp serve as a stdio MCP server exposing read-only autoresearch_status and autoresearch_watch_snapshot tools. Use autoresearch mcp call --server-command <cmd> --tool <name> --arguments '{}' to call a tool on an external stdio MCP server from an iteration script. Use autoresearch scope expand --format json to resolve active primary and companion repo scopes, with package roots inferred from Cargo.toml, package.json, pyproject.toml, and go.mod. Use autoresearch workspace exec --command <cmd> --rollback-on-failure to run one screened command across primary and companion repo targets, restoring attempted repos if any target fails. Use autoresearch guard-presets --format json to suggest per-repo guard commands for primary and companion repositories. Use autoresearch lessons --workspace-context --last 5 from any managed repo to show the shared workspace lessons path and repo targets. Use autoresearch plugin list and autoresearch plugin validate --path <file> to load local TOML mode plugin manifests with command safety screening. Use autoresearch plugin marketplace to validate .autoresearch/plugins/marketplace.toml and every referenced community mode manifest before installing or sharing it. Use ./install.sh --yes --vscode to install the lightweight VS Code package from integrations/vscode; it opens status --summary, dashboard --once, and watch --format jsonl from editor commands. Codex packages keep .agents/skills/autoresearch/SKILL.md as a thin router and load references/binary-operations.md only when native command details are needed. Use .github/actions/autoresearch in GitHub Actions to run exec mode with a checked-in goal, scope, metric, and verify command.

steps:
  - uses: actions/checkout@v4
  - uses: ./.github/actions/autoresearch
    with:
      goal: Reduce lint failures
      scope: '["src/**/*.rs", "tests/**/*.rs"]'
      metric: lint failure count
      verify: cargo clippy --all-targets --all-features -- -D warnings 2>&1 | tail -1
      direction: lower
      iterations: "3"

Project Defaults

autoresearch init reads .autoresearch.toml from the workspace root when present. CLI flags override file values.

goal = "Reduce failing tests"
scope = ["src/**/*.rs", "tests/**/*.rs"]
metric = "failing test count"
direction = "lower"
verify = "cargo test 2>&1 | tail -1"
guard = "cargo fmt -- --check"
iterations = 25
run_tag = "nightly"

Run with defaults:

autoresearch init

Generate a starter file:

autoresearch config template --output .autoresearch.toml
autoresearch config validate

Run Artifacts

All run state lives under autoresearch-results/:

results.tsv
state.json
context.json
lessons.md
handoff.json
launch.json
runtime.json
runtime.log

Do not commit autoresearch-results/ or .codex-autoresearch/.

Detailed Guides

Autoresearch Examples

Copy one block into your agent prompt after installing Autoresearch. Adjust the scope and commands to match your project.

TypeScript: Remove any

/autoresearch
Goal: Remove all explicit any usage
Scope: src/**/*.ts src/**/*.tsx
Metric: explicit any count
Direction: lower
Verify: rg -n ": any| as any|<any>" src 2>/dev/null | wc -l
Guard: npm test && npm run typecheck
Iterations: 30

Python: Raise Coverage

/autoresearch
Goal: Raise test coverage to 90%
Scope: src/**/*.py tests/**/*.py
Metric: coverage percent
Direction: higher
Verify: pytest --cov=src --cov-report=term | awk '/TOTAL/ {gsub("%", "", $4); print $4}'
Guard: pytest
Iterations: 25

Rust: Reduce Clippy Warnings

/autoresearch
Goal: Reduce clippy warnings to zero
Scope: src/**/*.rs tests/**/*.rs
Metric: clippy warning count
Direction: lower
Verify: cargo clippy --message-format short 2>&1 | tee /tmp/autoresearch-clippy.txt >/dev/null; rg -c "warning:" /tmp/autoresearch-clippy.txt || true
Guard: cargo test
Iterations: 20

Web App: Shrink Bundle

/autoresearch
Goal: Reduce production JavaScript bundle size
Scope: src/**/* package.json vite.config.* webpack.config.*
Metric: bundle bytes
Direction: lower
Verify: npm run build -- --json > /tmp/autoresearch-stats.json && node -e "const s=require('/tmp/autoresearch-stats.json'); console.log(s.assets.filter(a => a.name.endsWith('.js')).reduce((n, a) => n + a.size, 0))"
Guard: npm test
Iterations: 20

API: Lower Latency

/autoresearch
Goal: Lower p95 latency for the health endpoint
Scope: src/**/* routes/**/* handlers/**/*
Metric: p95 latency milliseconds
Direction: lower
Verify: hey -z 30s -c 10 http://localhost:3000/health | awk '/95%/ {print $2 * 1000}'
Guard: npm test
Iterations: 15

Parallel Experiments

Use this when several hypotheses are plausible and the run has enough CPU, RAM, and disk for isolated worker worktrees:

autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
# Fill in each worker metric, guard status, commit, and description.
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json

More domain-specific examples are in Examples by Domain.

Autoresearch — Product Design Review

Problem Statement

AI coding agents (Claude Code, Codex CLI, Cursor, etc.) need autonomous iteration to improve codebases against measurable metrics. Today, agents either:

  1. Ask after every change — breaking flow, requiring human attention for mechanical decisions
  2. Use heavyweight orchestration — Python/Node scripts with complex dependency chains, slow startup, runtime dependencies
  3. Have no memory across turns — repeat failed experiments, lose context on compaction

There is no lightweight, compiled infrastructure that gives agents a tight modify→verify→keep/discard loop with git as memory, automatic rollback on failure, and escalation when stuck.

Solution

A single compiled Rust binary (about 3MB) that provides:

  • Hook handler — sub-5ms responses for Claude Code’s plugin hook system (PreToolUse, PostToolUse, UserPromptSubmit, Stop, etc.)
  • CLI operationsinit, verify, guard, decide, resume, health, progress, watch, lessons, handoff, exec, plus runtime run/start/status/supervise/stop and parallel prepare/run/closeout/cleanup
  • Agent packages — Claude plugin commands, Codex .agents skill/plugin package, OpenCode command/skill/helper-agent package, and shared markdown protocols for iteration loops, security audits, debugging, shipping, product improvement research, and more

The binary handles the mechanical infrastructure. The agent handles the intelligence. Clean separation.

Target Users

UserIntegration
Claude Code usersInstaller builds the binary and installs the plugin hooks
Codex CLI users$skill-installer skill plus local .agents/plugins/marketplace.json plugin package
OpenCode usersGenerated .opencode/ commands, skill, and helper agent
Any LLM agentCLI called directly, skill markdown parsed by agent

Architecture

┌─────────────────┐     ┌──────────────┐     ┌───────────────┐
│ Agent (Claude/   │────▶│ autoresearch │────▶│ Git repo      │
│ Codex/other)     │     │ binary       │     │ (experiments) │
└─────────────────┘     └──────────────┘     └───────────────┘
        │                       │
        │ reads                 │ writes
        ▼                       ▼
┌─────────────────┐     ┌──────────────────────┐
│ SKILL.md /       │     │ autoresearch-results/ │
│ commands/*.md    │     │ ├── results.tsv       │
│ agent packages   │     │ ├── state.json        │
└─────────────────┘     │ ├── context.json      │
                        │ ├── lessons.md        │
                        │ ├── handoff.json      │
                        │ ├── launch.json       │
                        │ ├── runtime.json      │
                        │ └── runtime.log       │
                        └──────────────────────┘

Key Metrics

MetricTargetRationale
Hook response latency<5ms p99Hooks fire on every tool use; must be invisible
Binary size<5MBSingle-file distribution, no extraction needed
Runtime dependenciesZeroNo Node, Python, Docker. Just the binary.
Cold start<10msFirst invocation must feel instant
Memory usage<5MB RSSRuns alongside the agent, not competing for resources

Non-Goals

  • Not a replacement for the agent itself — the binary doesn’t make decisions about what to change. It handles verification, logging, rollback, and state management.
  • Not a CI/CD system — it runs locally alongside the agent. The exec mode supports CI but is not a pipeline orchestrator.
  • Not a test framework — it calls your existing test/lint/build commands and parses their output.
  • Not a package manager — it doesn’t manage dependencies, just detects dangerous ones during security audits.

Modes

ModePurpose
Core loopIterate against any numeric metric
DebugScientific bug hunting with hypothesis testing
FixCrush errors one-by-one until zero
SecuritySTRIDE + OWASP audit with red-team personas
ScenarioEdge case generation across 12 dimensions
PredictMulti-persona expert debate
ReasonAdversarial refinement with blind judges
ProbeRequirements interrogation until saturation
LearnAuto-generate documentation
Ship8-phase ship workflow
ImproveResearch ICP needs and generate product improvement PRDs
EvalsAnalyze iteration results
PlanConvert goal → validated config

Success Criteria

  1. Agent can iterate 25+ times without human intervention
  2. Failed experiments are automatically reverted (zero pollution)
  3. Cross-session memory via lessons.md survives compaction
  4. Hook latency is imperceptible to the agent/user
  5. Background autoresearch runtime run can relaunch Codex turns without corrupting artifacts
  6. Parallel worker closeout produces one authoritative retained result after verification
  7. Installation is one command for Claude, Codex, and OpenCode paths

Architecture

Binary Dual-Use

The autoresearch binary serves two roles from the same executable:

  1. CLI tool — direct invocation via autoresearch init, autoresearch decide, etc.
  2. Hook handler — invoked by the Claude Code plugin system via autoresearch hook <name>

The entry point in main.rs dispatches through clap subcommands. Hook mode parses the hook name and delegates to the corresponding handler in src/hooks/.

╭──────────────╮     ╭──────────────────╮     ╭───────────────╮
│  Agent call   │────▶│  autoresearch    │────▶│  CLI dispatch │
│  (or hook)    │     │  binary          │     │  (clap)       │
╰──────────────╯     ╰──────────────────╯     ╰───────┬───────╯
                                                       │
                      ╭────────────────────────────────┼────────────────╮
                      │                                │                │
                      ▼                                ▼                ▼
              ╭──────────────╮                ╭──────────────╮  ╭─────────────╮
              │  CLI command │                │  Hook handler│  │  Exec mode  │
              │  (init/log/  │                │  (scout,     │  │  (CI/CD     │
              │   decide/..) │                │   stop,..)   │  │   JSON-line)│
              ╰──────┬───────╯                ╰──────┬───────╯  ╰──────┬──────╯
                     │                               │                 │
                     ▼                               ▼                 ▼
              ╭──────────────────────────────────────────────────────────────╮
              │                       src/core/                             │
              │  config.rs  state.rs  results.rs  git.rs  verify.rs        │
              ╰─────────────────────────────────────────────────────────────╯

Module Breakdown

src/core/ — Shared foundation

FilePurpose
config.rsRunConfig, Direction, Mode, VerifyFormat, RollbackStrategy
state.rsRunState, RunPhase (state machine), IterationStatus, StopReason
results.rsResultRow, ResultsLog (TSV append/read), completion summary
git.rsGitRepo wrapper around libgit2 — HEAD, revert, reset, worktree status
verify.rsRun verify/guard commands, parse scalar or JSON output, screen for danger
metrics.rsMetric parsing utilities, decimal handling

src/hooks/ — Claude Code hook handlers

Each hook is a function that reads minimal state, makes a decision, and prints output. Hooks must complete in <5ms. No network calls, no heavy I/O.

HookFires onPurpose
session_initSession startDetect interrupted runs, load state
session_endSession endWrite final state, cleanup
iteration_contextUserPromptSubmitInject iteration number + last result
stop_checkStopCheck if iteration cap reached
scout_blockPreToolUse: Write/Edit/MultiEdit/Bash/Glob/Grep/ReadBlock generated/vendor/sensitive paths, Bash reads, and out-of-scope writes
dangerous_cmdPreToolUse: BashScreen for rm -rf, DROP TABLE, etc.
simplify_gateUserPromptSubmitEnforce “equal metric + less code = keep”
compaction_reanchorContext compactionRe-inject critical state after compaction
privacy_blockPreToolUse: Write/Edit/MultiEdit/Bash/Glob/Grep/ReadBlock credential paths and secret-looking inputs; warn on sensitive Bash paths
dev_rules_reminderUserPromptSubmitRemind agent of project conventions
subagent_contextSubagent spawnInject autoresearch state into subagent prompt

src/escalation/ — Failure recovery

FilePurpose
pivot.rsEscalationState — tracks consecutive discards, triggers refine/pivot/search
lessons.rsLessonsLog — append/search/read lessons.md

src/modes/ — Mode-specific logic

Each mode file contains the structured output types and validation logic for that subcommand. The actual iteration orchestration is done by the agent reading the corresponding command markdown file.

src/agents/ — Multi-agent support

Agent detection, context injection for different agent runtimes (Claude Code, Codex CLI).

State Machine

The RunPhase enum enforces valid transitions at the type level:

Setup → Baseline { metric }
Baseline → Iterating { iteration, current, best, best_iteration }
Iterating → Iterating (on keep/discard/crash/no-op)
Iterating → Complete { reason }
Iterating → Blocked { reason }
Blocked → Iterating (on resume)

RunState persists to autoresearch-results/state.json after every iteration. On resume, the binary reads state.json and reconstructs the full context.

Data Flow

Agent decides to modify code
    │
    ▼
autoresearch verify --command "..." → metric (Decimal)
    │
    ▼
autoresearch guard --command "..."  → pass/fail (optional)
    │
    ▼
autoresearch decide --decision auto --metric N --metrics-json '{...}'
    │
    ├── keep:    state.record_keep() → update state.json, append TSV
    └── discard: state.record_discard() → rollback, update state.json, append TSV

System Architecture

Autoresearch is a Rust binary plus agent-facing instruction packs. The binary owns mechanical state transitions; agents own reasoning, code edits, and hypothesis selection.

Components

ComponentRole
autoresearch binaryCLI, hook dispatcher, verifier, rollback controller, runtime supervisor
.claude-plugin/marketplace.jsonClaude marketplace manifest pointing at the repo-root plugin package
commands/Claude Code slash command instructions
skills/autoresearch/Claude/OpenCode skill package and shared references
.agents/skills/autoresearch/Codex/generic agent skill package
.agents/plugins/marketplace.jsonLocal Codex marketplace root for the packaged plugin
plugins/autoresearch/Codex plugin package generated from .agents/skills/autoresearch/
.opencode/OpenCode command, skill, and helper-agent distribution
references/Protocol source docs copied into installable packages
autoresearch-results/Runtime artifacts created inside the user’s target repo

Runtime Flow

agent chooses one hypothesis
    |
    v
edits scoped files and creates a trial commit
    |
    v
autoresearch verify runs the metric command
    |
    v
autoresearch guard runs the regression command when configured
    |
    v
autoresearch decide keeps, discards, logs, and updates state

The binary writes results.tsv and state.json after each decision. A discarded experiment is rolled back automatically, while a kept experiment remains in git history as the next baseline.

Parallel Flow

Parallel work is recorded as a batch:

autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json

Prepare creates branch-backed worker worktrees, prompt files, a manifest, and the editable batch file. Run executes the prepared worker prompts in those worktrees and records crashed or timed-out workers in the manifest. Closeout cherry-picks the best worker, re-runs verify and guard in the main worktree, falls back to the next worker on merge or verification failure, then writes worker audit rows and one authoritative retained batch row. Cleanup removes worker worktrees and branches.

Background Runtime

autoresearch runtime run supervises Codex execution through persisted artifacts:

ArtifactPurpose
launch.jsonCommand, cwd, repo targets, goal, iteration limit, and stop criteria
runtime.jsonCurrent status and supervisor recommendation
runtime.logDetached runtime output

Manual controls remain available through runtime start, runtime status, runtime supervise, and runtime stop.

More Detail

See Architecture for module-level internals and Guide for user-facing command flow.

Autoresearch — Codebase Summary

AI-friendly reference for agents working on this codebase.

Entry Points

PathPurpose
src/main.rsCLI entry — Clap-based command dispatch for run setup, verify/guard/decide, health, resume/status/progress/watch, lessons, handoff, exec, runtime, parallel, screen, and hooks
.claude-plugin/marketplace.jsonClaude marketplace manifest for the repo-root plugin package
hooks/hooks.jsonClaude Code plugin hook definitions — maps lifecycle events to binary invocations
skills/autoresearch/SKILL.mdAgent skill file — iteration protocol, subcommand table, references
.agents/skills/autoresearch/Maintained Codex skill package used by direct Codex installs
plugins/autoresearch/Codex plugin package generated from the .agents skill package
.agents/plugins/marketplace.jsonLocal Codex marketplace entry pointing at plugins/autoresearch/
.opencode/Generated OpenCode commands and skill package, plus the maintained docs-manager helper agent
commands/autoresearch.mdRoot command protocol (core iteration loop)
commands/autoresearch/*.mdSubcommand protocols (debug, fix, security, scenario, etc.)

Core Modules (src/core/)

FileResponsibility
config.rsRunConfig, Direction, VerifyFormat, RollbackStrategy types
git.rsGitRepo wrapper — status, head, revert, worktree checks
verify.rsRun verify commands, parse scalar/JSON output, safety screening
results.rsResultsLog — append TSV rows, read history
state.rsRunState — iteration count, metrics, keeps/discards, phase tracking
metrics.rsMetric parsing, delta calculation, direction comparison
context.rsCanonical context.json and repo-local pointer writing
health.rsNative preflight checks for runtime launch safety
runtime.rsBackground launch/runtime manifests, supervisor snapshots, and stop control

Escalation (src/escalation/)

FileResponsibility
pivot.rsEscalationState — track consecutive discards, trigger REFINE/PIVOT/SEARCH/STOP
lessons.rsLessonsLog — read/write/search cross-run learning entries

Hooks (src/hooks/)

FileHookFires On
scout_block.rsscout-blockPreToolUse (Write/Edit/MultiEdit/Bash/Glob/Grep/Read) — blocks generated paths, Bash reads, and out-of-scope writes
privacy_block.rsprivacy-blockPreToolUse — blocks access to sensitive paths
dangerous_cmd.rsdangerous-cmd-blockPreToolUse (Bash) — blocks rm -rf, fork bombs, etc.
iteration_context.rsiteration-contextUserPromptSubmit — injects run state into agent context
dev_rules_reminder.rsdev-rules-reminderUserPromptSubmit — re-injects active protocol and code standards
simplify_gate.rssimplify-gateUserPromptSubmit — reminds agent of simplicity rule
stop_check.rsstop-checkStop — detects premature stop during active run
compaction_reanchor.rscompaction-reanchorPostCompact — re-injects critical state after context compaction
session_init.rssession-initSessionStart — detects interrupted runs
session_end.rssession-endSessionEnd — emits terminal notification and optional webhook summary
subagent_context.rssubagent-contextSubagentStart — passes run context to subagents

Modes (src/modes/)

Thin logic for mode-specific state (most protocol lives in markdown commands): loop_mode.rs, debug.rs, fix.rs, security.rs, scenario.rs, predict.rs, reason.rs, probe.rs, learn.rs, ship.rs, evals.rs, improve.rs, plan.rs

Agents (src/agents/)

FilePurpose
claude.rsClaude Code-specific integration helpers
codex.rsCodex CLI-specific integration helpers

Data Flow

User prompt → [hook: iteration-context injects state]
           → Agent reads state + TSV + git log
           → Agent makes ONE change
           → Agent calls: autoresearch verify --command "..."
           → Binary runs command, returns metric + metrics JSON
           → Agent calls: autoresearch decide --decision auto --metric N --metrics-json '{...}'
           → Binary: evaluates criteria, updates state.json, appends TSV, reverts if discard
           → [hook: stop-check ensures agent doesn't quit early]
           → Next iteration

Background runs route the same state machine through autoresearch runtime run, which writes launch.json, runtime.json, and runtime.log, runs the native health preflight at each relaunch boundary, and supervises detached Codex turns. Parallel batches use autoresearch parallel prepare/run/closeout/cleanup to run worker worktrees and retain only one verified batch winner.

Key Types

TypeLocationFields
RunConfigcore/config.rsverify, direction, format, scope, guard, primary_metric_key
RunStatecore/state.rsiteration, baseline_metric, current_metric, best_metric, keeps, discards, crashes, consecutive_discards, phase
ResultRowcore/results.rsiteration, commit, metric, delta, guard, status, description
LaunchManifestcore/runtime.rsworkspace_root, execution_policy, codex_bin, repo_targets, config
EscalationStateescalation/pivot.rsconsecutive_discards, pivots, last_action
Directioncore/config.rsHigher, Lower
IterationStatuscore/state.rsBaseline, Keep, Discard, Crash, NoOp, Blocked, Pivot, Refine, Search

How to Add…

A new CLI command

  1. Add variant to Commands enum in src/main.rs
  2. Add match arm in main() dispatching to cmd_<name>() function
  3. Implement function at bottom of main.rs (or extract to module if >100 lines)

A new hook

  1. Add handler in src/hooks/<name>.rs
  2. Register in src/hooks/mod.rs
  3. Add hook entry in hooks/hooks.json under the appropriate lifecycle event
  4. Hook receives JSON on stdin, returns JSON on stdout, must complete in <5ms

A new mode/subcommand

  1. Add command protocol in commands/autoresearch/<mode>.md
  2. Add mode-specific state logic in src/modes/<mode>.rs (if needed)
  3. Register in src/modes/mod.rs
  4. Update SKILL.md subcommand table

Code Standards

Rust conventions for the autoresearch codebase.

Error Handling

  • Use anyhow::Result for all fallible functions.
  • Use thiserror for custom error types in library code that callers need to match on.
  • Use .context("descriptive message") on every ? — errors should be traceable.
  • Never unwrap() in library code. unwrap() is acceptable only in tests.
  • expect() is acceptable for provably infallible operations (e.g., regex compilation).

Serialization

  • All persistent types derive Serialize, Deserialize.
  • Use #[serde(rename_all = "snake_case")] for enum variants.
  • Use #[serde(tag = "phase")] for internally tagged enums (like RunPhase).
  • Use #[serde(default)] for optional fields added in later versions (forward compat).
  • Use #[serde(skip_serializing_if = "Option::is_none")] to keep JSON clean.

Documentation

  • Every public type and function has a /// doc comment.
  • Module-level //! doc comments describe the module’s role.
  • Use # Examples in doc comments for non-obvious APIs.

Testing

  • Unit tests live in #[cfg(test)] mod tests at the bottom of each file.
  • Integration tests live in tests/.
  • E2E fixtures live in tests/e2e/fixtures/.
  • Every new CLI subcommand gets a test in tests/cli_test.rs.
  • Every state transition gets a test in tests/state_test.rs.
  • Target: 80%+ line coverage on src/core/.

Style

  • Run cargo clippy -- -D warnings before every commit. Zero warnings.
  • Run cargo fmt before every commit.
  • Run ./scripts/run_contributor_gate.sh before opening a PR.
  • Run ./scripts/validate_distribution.sh after changing skill, command, reference, or agent metadata files.
  • Run ./scripts/run_skill_e2e.sh binary-smoke --clean after changing core run closeout or result-monitoring behavior.
  • Run ./scripts/run_skill_e2e.sh runtime-smoke --clean after changing runtime launch, status, or stop behavior.
  • Run ./scripts/run_skill_e2e.sh parallel-smoke --clean after changing parallel worker prepare/run/cleanup behavior.
  • Max line length: 100 characters (soft), 120 characters (hard).
  • Prefer match over if let chains for exhaustive enum handling.
  • Prefer &str over String in function parameters when ownership isn’t needed.

Performance

  • Hooks must complete in <5ms. No network calls, no heavy I/O in hook handlers.
  • Use Decimal (not f64) for all metric values — financial-grade precision.
  • Release builds use opt-level = "z", LTO, strip, panic = "abort".

Naming

  • Types: PascalCase (e.g., RunState, ResultRow)
  • Functions: snake_case (e.g., record_keep, run_verify)
  • CLI subcommands: lowercase single words (e.g., init, decide, evals)
  • Constants: SCREAMING_SNAKE_CASE
  • Files: snake_case.rs

Dependencies

  • Minimize dependency count. Current deps are intentional:
    • clap — CLI parsing
    • serde + serde_json — serialization
    • tokio — async runtime (for exec mode)
    • rust_decimal — precise metric values
    • chrono — timestamps
    • git2 — libgit2 bindings
    • regex — pattern matching
    • anyhow + thiserror — error handling
    • glob — file pattern matching
  • Do not add dependencies without justification in the PR description.

Project Changelog

This page is the high-level release history entrypoint. The canonical Keep-a-Changelog file is changelog.md.

Current Development Track

Recent work has focused on catching the binary and installable agent packages up to the stronger autoresearch implementations:

  • Background runtime control through autoresearch runtime run and runtime start/status/supervise/stop
  • Live log monitoring through autoresearch watch
  • Native parallel worker support through autoresearch parallel prepare, run, verified closeout, and cleanup, including worker crash/timeout recording
  • Codex, Claude Code, and OpenCode installation paths
  • Distribution validation for generated command and skill packages
  • Binary smoke tests for installed skill instructions
  • Direct documentation entrypoints for installation, usage, examples, and system architecture

Release Notes

See changelog.md for versioned release notes and development-roadmap.md for planned work.

Changelog

All notable changes to this project will be documented in this file.

Format based on Keep a Changelog.

[0.1.0] — 2025-05-27

Initial release.

Added

  • Core engine: init, verify, guard, log, decide, status, resume, progress, watch CLI commands
  • State machine: RunPhase enum (Setup → Baseline → Iterating → Complete/Blocked) with typed transitions
  • Results logging: TSV format with iteration, commit, metric, delta, guard, status, description columns
  • State persistence: state.json with full run context, resume support for interrupted sessions
  • Git integration: libgit2-based revert and hard-reset rollback strategies, worktree status detection
  • Verify system: scalar and metrics_json output formats, command screening for dangerous patterns
  • Escalation protocol: 3-tier (refine → pivot → web search → stop) triggered by consecutive discards
  • Lessons log: Markdown-based learnings that persist across sessions, with search and tail queries
  • 12 subcommands: improve, debug, fix, security, scenario, predict, learn, reason, probe, evals, ship, plan
  • Exec mode: Non-interactive CI/CD mode — reads config from stdin, emits JSON lines
  • Background runtime: runtime run managed relaunch loop plus start/status/supervise/stop artifacts, detached launch control, and relaunch/stop/needs_human supervisor recommendations
  • Parallel runtime: parallel prepare/run/closeout/cleanup manages worker worktrees, records crashes/timeouts, cherry-picks verified winners, and logs one authoritative retained batch row
  • Handoff system: Structured JSON handoff between modes for chained workflows
  • 11 hook handlers: session_init, session_end, iteration_context, stop_check, scout_block, dangerous_cmd, simplify_gate, compaction_reanchor, privacy_block, dev_rules_reminder, subagent_context
  • Claude Code plugin: .claude-plugin/plugin.json manifest with hook definitions
  • Codex skill: .agents/skills/autoresearch/ for direct Codex installs, plus plugins/autoresearch/ for the local Codex plugin marketplace
  • OpenCode package: .opencode/ commands, skill package, and hidden docs-manager helper agent
  • Agent commands: commands/autoresearch.md root + 12 subcommand files
  • Reference docs: 27 protocol and workflow reference documents
  • Release profile: opt-level = "z", LTO, strip, panic = "abort" — about 3MB binary with a 5MB contributor-gate ceiling

Development Roadmap

v0.1.0 — Foundation (current)

  • Core iteration engine (init, verify, guard, decide, log)
  • State machine with typed transitions
  • TSV results + JSON state persistence
  • Git rollback (revert + hard-reset)
  • Noise-aware scalar verification repeats with aggregation
  • 12 subcommands with full reference docs
  • Exec mode for CI/CD
  • 11 hook handlers
  • Claude Code plugin + Codex skill
  • Codex plugin package + local marketplace entry
  • Thin Codex skill router with detailed binary operations in references
  • Escalation protocol (refine → pivot → search → stop)
  • Lessons log with search

v0.2.0 — Background Mode + Parallel Experiments

  • Background runtime artifacts + detached Codex launch control (autoresearch runtime start/status/supervise/stop)
  • Background supervisor recommendation (autoresearch runtime supervise) with iteration cap, criteria, stop-condition, soft-blocker, and stagnation decisions
  • Background supervisor relaunch loop that automatically executes recommended relaunches (autoresearch runtime run)
  • Parallel batch templates (autoresearch parallel template) for editable worker result JSON
  • Parallel worker preparation (autoresearch parallel prepare) with branch-backed git worktrees, prompts, manifest, and batch file
  • Parallel worker launch (autoresearch parallel run) for prepared codex exec workers, including timeout/crash recording
  • Parallel batch closeout (autoresearch parallel closeout) with cherry-pick, post-merge verify/guard, fallback, worker audit rows, and one authoritative retained-state update
  • Parallel cleanup (autoresearch parallel cleanup) for worker worktrees and branches
  • Experiment branching — each trial on its own git branch
  • Branch merge strategy selection (fast-forward, squash, rebase)
  • autoresearch watch — tail results in real-time
  • Progress websocket for real-time monitoring
  • Improved evals: statistical significance testing on parallel results

v0.3.0 — Web Search + MCP Integration

  • Built-in web search escalation (configurable provider command)
  • MCP tool server mode — expose autoresearch as an MCP tool
  • MCP client mode — call external MCP tools during iteration
  • Structured search queries from escalation context
  • Search result caching to avoid redundant queries
  • autoresearch search — standalone web search for the current problem

v0.4.0 — Multi-Repo + Workspace Support

  • Workspace-owned artifacts (autoresearch-results/) and repo-local pointers for managed repos
  • Companion repo registration through --companion-repo-scope PATH=SCOPE
  • Companion repo preflight, health, and runtime dirty-worktree safeguards
  • Cross-repo change execution and rollback across companion repos
  • Workspace-aware scope expansion (monorepo package boundaries)
  • Cross-repo guard command presets
  • Native environment probe command for CPU, disk, container, toolchain context, and init metadata
  • Shared lessons across repos in a workspace

v1.0.0 — Stable API + Ecosystem

  • Stable CLI API — semver guarantees on commands, flags, and output formats
  • Native plan command for repo-aware launch config suggestions
  • Native debug generator for hypothesis-driven investigation bundles
  • Native fix generator for one-error-at-a-time repair-plan bundles
  • Native improve artifact bundle for research findings, ranked plan, TSV, summary, and handoff
  • Native PRD generator for selected improve-mode ideas
  • Native security generator for STRIDE + OWASP audit bundles
  • Native ship generator for 8-phase checklist bundles
  • Native scenario generator for 12-dimension edge-case artifacts
  • Native predict generator for five-persona review artifacts
  • Native reason generator for adversarial candidate debate artifacts
  • Native probe generator for eight-persona constraint artifacts
  • Native learn generator for documentation summary artifacts
  • Adaptive eval checkpoint command for long-running loops
  • Native protocol re-anchor command for long-running Codex sessions
  • Plugin system — loadable mode definitions (TOML or YAML)
  • Plugin marketplace — community-contributed modes
  • Configuration file (.autoresearch.toml) for project-level defaults
  • Shell completions (bash, zsh, fish, elvish, PowerShell)
  • Man pages generation
  • Pre-built binaries for Linux (x86_64, aarch64), macOS (x86_64, aarch64), Windows
  • Homebrew formula and cargo-binstall support
  • Comprehensive documentation site
  • GitHub Action for autoresearch in CI
  • Metric history graphing (sparklines in terminal)
  • Cost tracking — estimate token/API spend per iteration
  • A/B experiment mode — compare two approaches head-to-head
  • Interactive TUI dashboard for monitoring runs
  • VS Code extension for run visualization with source installer support

Future Ideas (unscheduled)

  • Re-check upstream autoresearch projects before the next feature milestone

autoresearch

Autonome zielgerichtete Iterations-Engine für Coding-Agenten. In Rust geschrieben.

„Ziel festlegen → Agent führt die Schleife aus → Du wachst mit Ergebnissen auf“

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


Funktionsweise

Du beschreibst das Ziel  →  Agent bestätigt Konfiguration  →  Du sagst "los"
                                                                │
                                                       ┌────────┴────────┐
                                                       │  Schleife aktiv  │
                                                       │                  │
                                                       │  1. Kontext lesen│
                                                       │  2. Hypothese    │
                                                       │  3. EINE Änderung│
                                                       │  4. Git Commit   │
                                                       │  5. Verifizieren │
                                                       │  6. Verbessert?  │
                                                       │     → behalten   │
                                                       │     → rückgängig │
                                                       │  7. Protokoll    │
                                                       │  8. Nächste Runde│
                                                       └─────────────────┘

Jede Verbesserung addiert sich. Jeder Fehlschlag wird automatisch zurückgesetzt. Der Fortschritt wird im TSV-Format protokolliert. Die Eskalationsleiter (Verfeinern → Schwenken → Websuche → Stopp) verhindert endlose Wiederholungen.


Befehle

BefehlFunktionStandard-Iterationen
/autoresearchKern-Schleife: ändern → verifizieren → behalten/verwerfen25
/autoresearch:planInteraktiver Assistent → validierte Konfigurationeinmalig
/autoresearch:debugBug-Jagd durch Hypothesen-Iteration15
/autoresearch:fixFehler einzeln bis auf null korrigieren20
/autoresearch:securitySTRIDE + OWASP Sicherheitsaudit15
/autoresearch:ship8-Phasen-Release-Workflowlinear
/autoresearch:scenarioGrenzfälle über 12 Dimensionen generieren20
/autoresearch:predictDebatte zwischen 5 Experten-Personaseinmalig
/autoresearch:learnErkunden → Doku generieren → validieren → korrigieren10
/autoresearch:reasonKontradiktorische Debatte mit Blind-Richtern8
/autoresearch:probe8 Personas hinterfragen Anforderungen15
/autoresearch:improveRecherche zu Produktverbesserungen20
/autoresearch:evalsErgebnisanalyse: Trends und Plateauseinmalig

Schnellstart

Claude Code (Plugin-Installation)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Session neu starten. Alle 13 Befehle sind sofort verfügbar.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Dann: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Nutzen: /autoresearch oder /autoresearch_debug.

Aus dem Quellcode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Benötigt die Rust-Toolchain (rustup.rs). Erzeugt eine ca. 3 MB große Binärdatei ohne Laufzeitabhängigkeiten.


Wichtigste Regeln

  1. Eine Änderung pro Runde — atomare Experimente schaffen Kausalität
  2. Erst lesen, dann schreiben — git log und TSV vor der Änderung prüfen
  3. Nur mechanische Verifikation — Befehl ausführen, Zahl auswerten
  4. Automatischer Rollbackgit revert HEAD --no-edit bei Fehlschlag
  5. Einfachheit gewinnt — gleiche Metrik + weniger Code = behalten

Vollständige Dokumentation (English)

autoresearch

Motor de iteración autónoma dirigido por objetivos para agentes de programación. Escrito en Rust.

«Define el OBJETIVO → El agente ejecuta el BUCLE → Despiertas con resultados»

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


Cómo funciona

Describes el objetivo  →  El agente confirma la config  →  Dices "adelante"
                                                             │
                                                    ┌────────┴────────┐
                                                    │   Bucle activo   │
                                                    │                  │
                                                    │  1. Leer contexto│
                                                    │  2. Hipótesis    │
                                                    │  3. Modificar UNO│
                                                    │  4. Git commit   │
                                                    │  5. Verificar    │
                                                    │  6. ¿Mejoró?    │
                                                    │     → conservar  │
                                                    │     → revertir   │
                                                    │  7. Registrar    │
                                                    │  8. Siguiente    │
                                                    └─────────────────┘

Cada mejora se acumula. Cada fallo se revierte automáticamente. El progreso se registra en formato TSV. La escalera de escalamiento (Refinar → Pivotar → Búsqueda web → Detener) previene reintentos infinitos.


Comandos

ComandoFunciónIteraciones por defecto
/autoresearchBucle principal: modificar → verificar → conservar/descartar25
/autoresearch:planAsistente interactivo → configuración validadaúnica
/autoresearch:debugCaza de bugs mediante iteración de hipótesis15
/autoresearch:fixCorregir errores uno a uno hasta llegar a cero20
/autoresearch:securityAuditoría STRIDE + OWASP con red-team15
/autoresearch:shipFlujo de lanzamiento en 8 faseslineal
/autoresearch:scenarioGenerar casos límite en 12 dimensiones20
/autoresearch:predictDebate entre 5 expertosúnica
/autoresearch:learnExplorar → generar docs → validar → corregir10
/autoresearch:reasonDebate adversarial con jueces ciegos8
/autoresearch:probe8 personas interrogan los requisitos15
/autoresearch:improveInvestigación de mejoras de producto20
/autoresearch:evalsAnálisis de resultados: tendencias y mesetasúnica

Inicio rápido

Claude Code (instalación de plugin)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Reinicia tu sesión. Los 13 comandos están disponibles.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Luego: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Usa: /autoresearch o /autoresearch_debug.

Desde el código fuente

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Requiere la cadena de herramientas de Rust (rustup.rs). Genera un binario de ~3 MB sin dependencias en tiempo de ejecución.


Reglas fundamentales

  1. Un solo cambio por turno — los experimentos atómicos establecen causalidad
  2. Leer antes de escribir — revisar git log y TSV antes de modificar
  3. Solo verificación mecánica — ejecutar el comando, extraer el número
  4. Rollback automáticogit revert HEAD --no-edit ante fallos
  5. La simplicidad gana — misma métrica + menos código = conservar

Documentación completa (English)

autoresearch

Moteur d’itération autonome dirigé par objectifs pour agents de programmation. Écrit en Rust.

« Définir l’OBJECTIF → L’agent exécute la BOUCLE → Vous vous réveillez avec des résultats »

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


Fonctionnement

Vous décrivez l'objectif  →  L'agent confirme la config  →  Vous dites "go"
                                                              │
                                                     ┌────────┴────────┐
                                                     │  Boucle active   │
                                                     │                  │
                                                     │  1. Lire contexte│
                                                     │  2. Hypothèse    │
                                                     │  3. Modifier UN  │
                                                     │  4. Git commit   │
                                                     │  5. Vérifier     │
                                                     │  6. Amélioré ?   │
                                                     │     → garder     │
                                                     │     → annuler    │
                                                     │  7. Journaliser  │
                                                     │  8. Tour suivant │
                                                     └─────────────────┘

Chaque amélioration s’empile. Chaque échec est automatiquement annulé. La progression est enregistrée au format TSV. L’échelle d’escalade (Affiner → Pivoter → Recherche web → Arrêt) empêche les tentatives infinies.


Commandes

CommandeFonctionItérations par défaut
/autoresearchBoucle principale : modifier → vérifier → garder/rejeter25
/autoresearch:planAssistant interactif → configuration validéeunique
/autoresearch:debugChasse aux bugs par itération d’hypothèses15
/autoresearch:fixCorriger les erreurs une par une jusqu’à zéro20
/autoresearch:securityAudit STRIDE + OWASP avec red-team15
/autoresearch:shipFlux de livraison en 8 phaseslinéaire
/autoresearch:scenarioGénérer des cas limites sur 12 dimensions20
/autoresearch:predictDébat entre 5 expertsunique
/autoresearch:learnExplorer → générer docs → valider → corriger10
/autoresearch:reasonDébat contradictoire avec juges aveugles8
/autoresearch:probe8 personas interrogent les exigences15
/autoresearch:improveRecherche d’améliorations produit20
/autoresearch:evalsAnalyse des résultats : tendances et plateauxunique

Démarrage rapide

Claude Code (installation plugin)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Redémarrez votre session. Les 13 commandes sont disponibles.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Puis : $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Utilisez : /autoresearch ou /autoresearch_debug.

Depuis les sources

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Nécessite la chaîne d’outils Rust (rustup.rs). Produit un binaire d’environ 3 Mo sans aucune dépendance d’exécution.


Règles essentielles

  1. Un seul changement par tour — les expériences atomiques établissent la causalité
  2. Lire avant d’écrire — consulter git log et le TSV avant de modifier
  3. Vérification mécanique uniquement — exécuter la commande, extraire le nombre
  4. Rollback automatiquegit revert HEAD --no-edit en cas d’échec
  5. La simplicité l’emporte — métrique identique + moins de code = garder

Documentation complète (English)

autoresearch

コーディングエージェント向け自律型目標駆動イテレーションエンジン。Rust 製。

「目標を設定 → エージェントがループを実行 → 目覚めたら結果が出ている」

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


仕組み

目標を記述  →  エージェントが設定を確認  →  「開始」と伝える
                                              │
                                     ┌────────┴────────┐
                                     │  ループ実行中     │
                                     │                  │
                                     │  1. コンテキスト読取 │
                                     │  2. 仮説を立てる    │
                                     │  3. 1箇所を変更    │
                                     │  4. Git コミット   │
                                     │  5. 検証を実行     │
                                     │  6. 改善した?     │
                                     │     → 保持        │
                                     │     → 元に戻す    │
                                     │  7. 結果を記録     │
                                     │  8. 次のターン     │
                                     └─────────────────┘

改善は積み重なり、失敗は自動的にリバートされます。進捗は TSV 形式で記録されます。エスカレーション(改良 → 方針転換 → Web 検索 → 停止)により無限リトライを防止します。


コマンド

コマンド機能デフォルト反復回数
/autoresearchコアループ:変更 → 検証 → 保持/破棄25
/autoresearch:plan対話型ウィザード → 検証済み設定1回
/autoresearch:debug仮説ベースのバグ追跡15
/autoresearch:fixエラーをゼロになるまで1つずつ修正20
/autoresearch:securitySTRIDE + OWASP セキュリティ監査15
/autoresearch:ship8フェーズのリリースフロー線形
/autoresearch:scenario12次元のエッジケース生成20
/autoresearch:predict5人の専門家ペルソナによる議論1回
/autoresearch:learn偵察 → ドキュメント生成 → 検証 → 修正10
/autoresearch:reasonブラインド審査付き対立的議論8
/autoresearch:probe8つのペルソナが要件を徹底質問15
/autoresearch:improveプロダクト改善リサーチ20
/autoresearch:evals反復結果の分析:傾向とプラトー1回

クイックスタート

Claude Code(プラグインインストール)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

セッションを再起動。13個すべてのコマンドが利用可能になります。

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

使い方:$autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

使用:/autoresearch または /autoresearch_debug

ソースからビルド

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Rust ツールチェーンが必要です(rustup.rs)。ランタイム依存ゼロの約 3MB バイナリが生成されます。


重要ルール

  1. 1ターン1変更 — 原子的な実験で因果関係を確立
  2. 書く前に読む — 変更前に git log と結果 TSV を確認
  3. 機械的検証のみ — コマンド実行、数値パース
  4. 自動ロールバック — 失敗時は git revert HEAD --no-edit
  5. シンプルさが勝つ — 同じメトリクス + コード削減 = 保持

完全なドキュメント(English)

autoresearch

코딩 에이전트를 위한 자율 목표 지향 반복 엔진. Rust로 작성.

“목표를 설정 → 에이전트가 루프를 실행 → 결과를 확인”

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


작동 방식

목표를 설명  →  에이전트가 설정을 확인  →  "시작"이라고 말함
                                            │
                                   ┌────────┴────────┐
                                   │   루프 실행 중     │
                                   │                  │
                                   │  1. 컨텍스트 읽기  │
                                   │  2. 가설 수립     │
                                   │  3. 한 곳만 수정   │
                                   │  4. Git 커밋      │
                                   │  5. 검증 실행     │
                                   │  6. 개선됨?       │
                                   │     → 유지        │
                                   │     → 롤백        │
                                   │  7. 결과 기록     │
                                   │  8. 다음 턴       │
                                   └─────────────────┘

모든 개선은 누적됩니다. 모든 실패는 자동으로 되돌려집니다. 진행 상황은 TSV 형식으로 기록됩니다. 에스컬레이션 사다리(정제 → 전환 → 웹 검색 → 중지)가 무한 재시도를 방지합니다.


명령어

명령어기능기본 반복 횟수
/autoresearch핵심 반복 루프: 수정 → 검증 → 유지/폐기25
/autoresearch:plan대화형 마법사 → 검증된 설정1회
/autoresearch:debug가설 반복을 통한 버그 추적15
/autoresearch:fix오류를 하나씩 제로까지 수정20
/autoresearch:securitySTRIDE + OWASP 보안 감사15
/autoresearch:ship8단계 배포 워크플로우선형
/autoresearch:scenario12개 차원에서 엣지 케이스 생성20
/autoresearch:predict5명의 전문가 페르소나 토론1회
/autoresearch:learn탐색 → 문서 생성 → 검증 → 수정10
/autoresearch:reason블라인드 심사가 있는 적대적 토론8
/autoresearch:probe8개 페르소나가 요구사항 심문15
/autoresearch:improve제품 개선 리서치20
/autoresearch:evals반복 결과 분석: 추세와 정체기1회

빠른 시작

Claude Code (플러그인 설치)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

세션을 재시작하세요. 13개 명령어가 모두 사용 가능합니다.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

사용법: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

사용법: /autoresearch 또는 /autoresearch_debug

소스에서 빌드

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Rust 툴체인이 필요합니다(rustup.rs). 런타임 의존성 없는 약 3MB 바이너리가 생성됩니다.


핵심 규칙

  1. 턴당 하나의 변경 — 원자적 실험으로 인과 관계를 확립
  2. 쓰기 전에 읽기 — 수정 전 git log와 결과 TSV 확인
  3. 기계적 검증만 — 명령 실행, 숫자 파싱
  4. 자동 롤백 — 실패 시 git revert HEAD --no-edit
  5. 단순함이 이긴다 — 동일한 메트릭 + 더 적은 코드 = 유지

전체 문서 (English)

autoresearch

Motor de iteração autônoma orientado a objetivos para agentes de programação. Escrito em Rust.

“Defina o OBJETIVO → O agente executa o LOOP → Você acorda com resultados”

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


Como funciona

Você descreve o objetivo  →  Agente confirma a config  →  Você diz "vai"
                                                            │
                                                   ┌────────┴────────┐
                                                   │   Loop ativo     │
                                                   │                  │
                                                   │  1. Ler contexto │
                                                   │  2. Hipótese     │
                                                   │  3. Modificar UM │
                                                   │  4. Git commit   │
                                                   │  5. Verificar    │
                                                   │  6. Melhorou?    │
                                                   │     → manter     │
                                                   │     → reverter   │
                                                   │  7. Registrar    │
                                                   │  8. Próximo turno│
                                                   └─────────────────┘

Cada melhoria se acumula. Cada falha é revertida automaticamente. O progresso é registrado em formato TSV. A escada de escalação (Refinar → Pivotar → Busca web → Parar) impede tentativas infinitas.


Comandos

ComandoFunçãoIterações padrão
/autoresearchLoop principal: modificar → verificar → manter/descartar25
/autoresearch:planAssistente interativo → configuração validadaúnica
/autoresearch:debugCaça a bugs por iteração de hipóteses15
/autoresearch:fixCorrigir erros um a um até zerar20
/autoresearch:securityAuditoria STRIDE + OWASP com red-team15
/autoresearch:shipFluxo de lançamento em 8 faseslinear
/autoresearch:scenarioGerar casos-limite em 12 dimensões20
/autoresearch:predictDebate entre 5 especialistasúnica
/autoresearch:learnExplorar → gerar docs → validar → corrigir10
/autoresearch:reasonDebate adversarial com juízes cegos8
/autoresearch:probe8 personas interrogam requisitos15
/autoresearch:improvePesquisa de melhorias de produto20
/autoresearch:evalsAnálise de resultados: tendências e platôsúnica

Início rápido

Claude Code (instalação via plugin)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Reinicie sua sessão. Todos os 13 comandos ficam disponíveis.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Depois: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Use: /autoresearch ou /autoresearch_debug.

A partir do código-fonte

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Requer a toolchain Rust (rustup.rs). Gera um binário de ~3 MB sem dependências de execução.


Regras fundamentais

  1. Uma mudança por turno — experimentos atômicos estabelecem causalidade
  2. Ler antes de escrever — checar git log e TSV antes de modificar
  3. Apenas verificação mecânica — executar o comando, extrair o número
  4. Rollback automáticogit revert HEAD --no-edit em caso de falha
  5. Simplicidade vence — mesma métrica + menos código = manter

Documentação completa (English)

autoresearch

Автономный целенаправленный итерационный движок для кодинг-агентов. Написан на Rust.

«Задай ЦЕЛЬ → Агент крутит ЦИКЛ → Просыпаешься с результатами»

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


Как это работает

Описываешь цель  →  Агент подтверждает конфигурацию  →  Говоришь "поехали"
                                                          │
                                                 ┌────────┴────────┐
                                                 │   Цикл активен   │
                                                 │                  │
                                                 │  1. Читать контекст│
                                                 │  2. Гипотеза      │
                                                 │  3. Изменить ОДНО │
                                                 │  4. Git коммит    │
                                                 │  5. Проверить     │
                                                 │  6. Улучшилось?   │
                                                 │     → оставить    │
                                                 │     → откатить    │
                                                 │  7. Записать      │
                                                 │  8. Следующий ход │
                                                 └─────────────────┘

Каждое улучшение накапливается. Каждая неудача автоматически откатывается. Прогресс записывается в формате TSV. Лестница эскалации (Уточнить → Сменить подход → Веб-поиск → Стоп) предотвращает бесконечные повторы.


Команды

КомандаФункцияИтераций по умолчанию
/autoresearchОсновной цикл: изменить → проверить → оставить/отбросить25
/autoresearch:planИнтерактивный мастер → валидированная конфигурацияразово
/autoresearch:debugПоиск багов через итерацию гипотез15
/autoresearch:fixИсправление ошибок по одной до нуля20
/autoresearch:securityАудит STRIDE + OWASP с red-team15
/autoresearch:ship8-фазный процесс выпускалинейно
/autoresearch:scenarioГенерация граничных случаев по 12 измерениям20
/autoresearch:predictДебаты 5 экспертных персонразово
/autoresearch:learnРазведка → генерация документации → валидация → исправление10
/autoresearch:reasonСостязательные дебаты со слепыми судьями8
/autoresearch:probe8 персон допрашивают требования15
/autoresearch:improveИсследование улучшений продукта20
/autoresearch:evalsАнализ результатов: тренды и платоразово

Быстрый старт

Claude Code (установка плагина)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Перезапустите сессию. Все 13 команд доступны.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Затем: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Используйте: /autoresearch или /autoresearch_debug.

Сборка из исходников

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Требуется Rust toolchain (rustup.rs). На выходе — бинарник ~3 МБ без runtime-зависимостей.


Ключевые правила

  1. Одно изменение за ход — атомарные эксперименты устанавливают причинность
  2. Читай перед записью — проверь git log и TSV перед изменением
  3. Только механическая верификация — выполнить команду, извлечь число
  4. Автоматический откатgit revert HEAD --no-edit при неудаче
  5. Простота побеждает — та же метрика + меньше кода = оставить

Полная документация (English)

autoresearch

面向编码代理的自主目标驱动迭代引擎。Rust 编写。

“设定目标 → 代理运行循环 → 你醒来就有结果”

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский


工作原理

你描述目标  →  代理确认配置  →  你说"开始"
                                    │
                           ┌────────┴────────┐
                           │    循环运行中     │
                           │                  │
                           │  1. 读取上下文    │
                           │  2. 提出假设      │
                           │  3. 修改一处      │
                           │  4. Git 提交      │
                           │  5. 运行验证      │
                           │  6. 有改善?      │
                           │     → 保留        │
                           │     → 回滚        │
                           │  7. 记录结果      │
                           │  8. 下一轮        │
                           └─────────────────┘

每次改善都会累积。每次失败都会自动回滚。进度以 TSV 格式记录。升级策略(细化 → 转向 → 网络搜索 → 停止)防止无限暴力重试。


命令

命令功能默认迭代次数
/autoresearch核心迭代循环:修改 → 验证 → 保留/丢弃25
/autoresearch:plan交互式向导 → 验证后的配置一次性
/autoresearch:debug通过假设迭代追踪缺陷15
/autoresearch:fix逐一修复错误直至归零20
/autoresearch:securitySTRIDE + OWASP 安全审计15
/autoresearch:ship8 阶段发布流程线性
/autoresearch:scenario跨 12 个维度生成边界用例20
/autoresearch:predict5 位专家角色辩论一次性
/autoresearch:learn侦察 → 生成文档 → 验证 → 修复10
/autoresearch:reason对抗性辩论与盲审评判8
/autoresearch:probe8 个角色审问需求15
/autoresearch:improve产品改进研究20
/autoresearch:evals分析迭代结果:趋势与瓶颈一次性

快速开始

Claude Code(插件安装)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

重启会话。全部 13 个命令立即可用。

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

然后使用:$autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

使用:/autoresearch/autoresearch_debug

从源码构建

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

需要 Rust 工具链(rustup.rs)。生成约 3MB 的零依赖二进制文件。


核心规则

  1. 每轮只改一处 — 原子实验才能建立因果关系
  2. 先读再写 — 修改前先查看 git log 和结果 TSV
  3. 机械验证 — 运行命令,解析数字
  4. 自动回滚 — 失败时执行 git revert HEAD --no-edit
  5. 简洁为王 — 指标相同 + 代码更少 = 保留

完整文档(English)