Documentation

Start here when browsing the repository documentation directly.

Doc	Purpose
Installation	Install paths for Claude Code, Codex, OpenCode, and source builds
Guide	Command map, binary operations, artifacts, and guide links
Examples	Copy-paste configs for common autoresearch goals
System Architecture	Binary, agent package, runtime, and artifact architecture
Project Changelog	Release history entrypoint and current development track
Detailed Changelog	Versioned release notes
Development Roadmap	Current and planned runtime, search, MCP, workspace, and release work

The documentation set also builds as an mdBook site from book.toml and docs/SUMMARY.md.

Installation

Autoresearch ships as a Rust binary plus agent-specific skill or command packages.

Agent-Driven Install

The primary path is to give this prompt to the agent that should use Autoresearch:

Install Autoresearch in this environment.

Use the installer from:
https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh

Pick the install flag for the current agent:
- Claude Code: --claude
- Codex: --codex
- OpenCode: --opencode
- If you cannot infer the agent, use --all.

Run the installer non-interactively with bash, verify `autoresearch --help`, then tell me the command I should use to start Autoresearch in this agent.
Start commands are `/autoresearch` for Claude Code, `$autoresearch` for Codex, and `/autoresearch` for OpenCode.
Use a global install unless I explicitly asked for a project-local install.

Raw GitHub Installer

Use the raw installer when you want the source build plus agent package without cloning first. Pick the exact command for your agent:

Claude Code:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --claude

Codex:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex

OpenCode:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode

All packages:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --all

The raw script downloads a source archive, builds the Rust binary, and runs the same install.sh used by a local clone. Set AUTORESEARCH_INSTALL_REF to use a different branch, AUTORESEARCH_INSTALL_REPO to use a fork, or AUTORESEARCH_INSTALL_ARCHIVE_URL to provide an explicit archive URL.

Pre-Built Binaries

Tagged releases publish .tar.gz archives for Linux x86_64, Linux aarch64, macOS x86_64, macOS aarch64, and Windows x86_64. Download the archive for your platform from the GitHub release, verify the adjacent .sha256 file, and place the autoresearch binary on your PATH.

Cargo Binstall

Cargo.toml includes cargo-binstall metadata for the same target-named release archives:

cargo binstall autoresearch

Homebrew

Homebrew tap maintainers can render packaging/homebrew/autoresearch.rb.template with the release version and SHA-256 values from GitHub release assets, then publish it as Formula/autoresearch.rb in a tap.

Claude Code

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --claude

This builds autoresearch, installs it on your PATH, and installs the Claude Code plugin hooks.

If the binary is already installed:

claude plugin add coder-company/agent-autoresearch

Restart Claude Code after installing the plugin.

Manual local Claude package:

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch

# From the project where you want local commands/skills:
mkdir -p /path/to/project/.claude
cp -R .claude/commands /path/to/project/.claude/commands
cp -R .claude/skills/autoresearch /path/to/project/.claude/skills/autoresearch

The .claude/ package is generated from the same canonical command and reference files as the plugin package.

Codex

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex

This builds autoresearch, installs it on your PATH, and installs the Codex skill package.

Then start Codex from your project and invoke:

$autoresearch

For the smoothest foreground and background runs, start Codex with full workspace access:

codex --dangerously-bypass-approvals-and-sandbox

If you only want the Codex skill package and not the source-built binary:

$skill-installer install https://github.com/coder-company/agent-autoresearch

For a project-local Codex skill install, run the raw installer from the target project:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex --local

That installs to ./.codex/skills/autoresearch in the current project. Use --global for the default user-wide target, or --codex-dir for an explicit destination.

From a local clone:

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --codex

The installer copies .agents/skills/autoresearch/ and validates the target path before replacing the installed skill directory.

Project-local install from a local clone:

/path/to/agent-autoresearch/install.sh --yes --codex --local

To install the local Codex plugin package through the installer:

./install.sh --yes --codex-plugin

Local plugin package:

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
codex plugin marketplace add .agents/plugins/marketplace.json
codex plugin install autoresearch@autoresearch-local

The marketplace entry points at plugins/autoresearch/, which packages the same maintained Codex skill plus plugin metadata.

OpenCode

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode

This builds autoresearch, installs it on your PATH, and installs the OpenCode command and skill package.

Start OpenCode from your project and invoke:

/autoresearch

OpenCode mode commands use underscore names such as /autoresearch_debug, /autoresearch_fix, and /autoresearch_security. The package also installs the hidden docs-manager helper agent for focused documentation updates.

For a project-local OpenCode install, run the raw installer from the target project:

curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode --local

That installs to ./.opencode in the current project. Use --global for the default user-wide target, or --opencode-dir for an explicit OpenCode config root.

Project-local install from a local clone:

/path/to/agent-autoresearch/install.sh --yes --opencode --local

The installer refuses empty, home, and parent config paths before replacing skills/autoresearch.

VS Code

./install.sh --yes --vscode

That copies integrations/vscode into your VS Code extensions directory and keeps the installed extension delegated to the autoresearch binary on PATH. Use --vscode-dir for an explicit extensions directory.

From Source

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --all

Use ./install.sh without flags for the guided installer.

Verify The Install

autoresearch --help
autoresearch screen --command "npm test"
autoresearch completions zsh >/tmp/_autoresearch
autoresearch manpages --output-dir /tmp/autoresearch-manpages
autoresearch config template >/tmp/autoresearch.toml
autoresearch config validate --path /tmp/autoresearch.toml

For repository contributors:

./scripts/validate_distribution.sh
./scripts/run_skill_e2e.sh binary-smoke --clean
./scripts/run_skill_e2e.sh runtime-smoke --clean
./scripts/run_skill_e2e.sh parallel-smoke --clean
./scripts/run_contributor_gate.sh

See Getting Started and Codex usage for first-run examples.

Guide

Autoresearch is a loop controller for agents: define a measurable goal, modify one thing, verify mechanically, keep or discard, and repeat.

Core Commands

Need	Use
Improve a metric	`/autoresearch` or `$autoresearch`
Pick a metric from a vague goal	`/autoresearch:plan` or `$autoresearch plan`
Find a root cause	`/autoresearch:debug` or `$autoresearch debug`
Reduce errors to zero	`/autoresearch:fix` or `$autoresearch fix`
Run a security audit	`/autoresearch:security` or `$autoresearch security`
Ship through gates	`/autoresearch:ship` or `$autoresearch ship`
Analyze prior runs	`/autoresearch:evals` or `$autoresearch evals`

Binary Operations

The agent-facing protocols delegate stateful work to the autoresearch binary:

autoresearch init --verify "cat metric.txt" --direction lower
autoresearch verify --command "cat metric.txt"
autoresearch verify --command "cat metric.txt" --repeat 3 --aggregate median
autoresearch plan --goal "reduce any types" --format json
autoresearch debug --symptom "API returns 500" --scope "src/**/*.rs"
autoresearch fix --target "npx tsc --noEmit" --scope "src/**/*.ts" --category type
autoresearch improve --goal "Improve onboarding activation" --icp "Developer tools teams"
autoresearch prd --title "Improve onboarding" --problem "New users stall before first run"
autoresearch security --scope "src/**/*.rs" --focus auth
autoresearch ship --target "Release v1.2.0" --type code-release --dry-run
autoresearch scenario --target "Checkout flow" --domain web --format test-scenarios --scope "src/checkout/**"
autoresearch predict --proposal "Add cache warming to search results" --scope "src/search/**"
autoresearch predict --proposal "Find product improvements for onboarding" --scope "src/**" --improve
autoresearch reason --question "Should we replace the storage layer" --mode debate --domain software
autoresearch probe --subject "Payment retry workflow" --scope "src/payments/**"
autoresearch probe --subject "Onboarding activation workflow" --scope "src/**" --improve
autoresearch learn --mode summarize --scope "src/**/*.rs"
autoresearch decide --decision auto --metric 4 --commit abc1234 --description "improved"
autoresearch status --summary
autoresearch progress
autoresearch cost --per-iteration-usd 0.25 --format json
autoresearch dashboard --once
autoresearch health --strict
autoresearch env --format json
autoresearch init --environment-summary auto --verify "cat metric.txt" --direction lower
autoresearch checkpoint --format json
autoresearch reanchor --format json
autoresearch watch --lines 20 --format jsonl
autoresearch watch --websocket --websocket-addr 127.0.0.1:8765
autoresearch lessons --add "Prefer fixture-level assertions" --context "reduced flaky tests"
autoresearch search --from-state --provider-command 'exa "$AUTORESEARCH_SEARCH_QUERY"' --log
autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel template --workers 3 --output autoresearch-results/parallel-workers.json
autoresearch parallel compare --a "simplify parser" --b "cache scan results"
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json --merge-strategy cherry-pick
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json
autoresearch evals --file autoresearch-results/results.tsv --format json --recommend --plateau-window 5 --target 90 --fail-on goal-not-met --chain ship
autoresearch evals --file autoresearch-results/results.tsv --compare autoresearch-results/previous-results.tsv --format json
autoresearch api --format json
autoresearch mcp serve
autoresearch mcp call --server-command "autoresearch mcp serve" --tool autoresearch_status
autoresearch scope expand --format json
autoresearch workspace exec --command "cargo test" --rollback-on-failure
autoresearch guard-presets --format json
autoresearch lessons --workspace-context --last 5
autoresearch plugin list
autoresearch plugin validate --path .autoresearch/plugins/example.toml
autoresearch plugin marketplace
autoresearch completions zsh > ~/.zfunc/_autoresearch

Use autoresearch runtime run for supervised background Codex sessions and autoresearch runtime status / autoresearch runtime stop for control. Use autoresearch env --format json to capture CPU, disk, container, toolchain, and recommended parallel-worker context before planning long or parallel runs; pass --environment-summary auto to init to persist that probe summary in results.tsv. Use autoresearch status --summary for compact monitor-friendly counters. Use autoresearch progress for the current metric, trend, counters, escalation state, and terminal metric history sparkline. Use autoresearch verify --repeat <n> --aggregate <median|mean|min|max|last> for noisy scalar metrics; repeated verification returns the aggregate metric plus the raw samples. Use autoresearch cost --per-iteration-usd <usd> or token/rate flags to estimate completed, remaining, and projected run spend. Use autoresearch dashboard --once for a combined terminal view of status, trend, metric history, escalation, and recent rows; omit --once for live refresh. Use autoresearch checkpoint --format json inside long loops to run evals only when the active iteration reaches the configured or adaptive checkpoint interval. Use autoresearch reanchor --format json every 10 iterations or after context compaction to print the protocol fingerprint, reload references, and [RE-ANCHOR] logging tag. Use autoresearch watch --format <tsv|jsonl> for human-readable tails or machine-readable JSON Lines. Use autoresearch watch --websocket --websocket-addr <host:port> to serve snapshot and row update payloads to real-time dashboards. Add --once to print the initial WebSocket snapshot envelope without starting a server. Use autoresearch lessons --add <strategy> --context <note> to append reusable lessons without editing lessons.md by hand. Use autoresearch search --from-state with --provider-command or AUTORESEARCH_SEARCH_CMD to run cached, run-aware web searches. Add --log to append a search meta-iteration. When decide escalates to Web Search, it automatically runs the same cached helper with AUTORESEARCH_SEARCH_CMD and logs the result when timing/cooldown limits allow it. Use autoresearch parallel closeout --merge-strategy <cherry-pick|fast-forward|squash|rebase> to select how the retained worker commit is merged. Use autoresearch parallel compare --a <hypothesis> --b <hypothesis> to prepare a two-arm A/B batch that reuses parallel run and verified parallel closeout. Use autoresearch evals --file <path> --format json --recommend --plateau-window 5 --target <metric-threshold> --fail-on goal-not-met --chain ship after parallel closeout to include worker improvement counts, a sign-test summary, anomaly detection, goal-achieved status, CI-friendly exit gating, next-step guidance, and downstream handoff metadata. Use autoresearch evals --file <path> --compare <other-results.tsv> --format json to compare run improvement, efficiency, and plateau length before choosing the next strategy. Use autoresearch completions <bash|zsh|fish|elvish|powershell> to generate shell completions. Use autoresearch manpages --output-dir man/man1 to generate a local autoresearch.1 manual page. Use autoresearch config template --output .autoresearch.toml to write a starter project defaults file. Use autoresearch config validate to parse defaults, validate options, and screen configured commands without running them. Use autoresearch plan --goal <goal> --format json to get a launch-ready suggested scope, metric, direction, verify, guard, and iteration count from detected repo tooling. Use autoresearch plan --goal <goal> --debug to write the derived config into a downstream debug handoff. Native artifact generators default to ignored autoresearch-results/<mode>/ paths; pass --output or --output-dir only when you intentionally want a different artifact location. Use autoresearch debug --symptom <failure> --scope <glob> to write a hypothesis-driven investigation bundle with summary, findings, eliminated hypotheses, TSV, and handoff JSON. Add --fix or --chain <targets> to autoresearch debug to record downstream chain metadata in the debug handoff. Use autoresearch debug --depth deep --iterations 12 --severity high to override the investigation budget and record severity filter metadata. Use autoresearch fix --target <verify-command> --scope <glob> --iterations 7 to write a repair-plan bundle under autoresearch-results/fix with priority order, results TSV, iteration budget, and handoff JSON. Use autoresearch fix --from-debug to import the latest debug handoff scope, symptom, and finding count into the repair plan. Use autoresearch fix --learn --evals to record downstream learn handoff and checkpoint propagation metadata. Use autoresearch improve --goal <product-area> --icp <persona> to write an improve-mode artifact bundle: research findings, ranked plan, summary, TSV, and handoff JSON. Use autoresearch improve --goal <product-area> --icp <persona> --depth deep --iterations 24 --evals to override the research budget and record active category count plus checkpoint metadata. Use autoresearch improve --goal <product-area> --seeds 5 --no-discover --learn to record seed volume, discovery posture, and downstream learn handoff metadata. Use autoresearch prd --title <title> --problem <problem> to write a focused improve-mode PRD with DECISION NEEDED markers, acceptance criteria, risks, success metrics, and an autoresearch config block. Use autoresearch security --scope <glob> --focus <area> to write a STRIDE + OWASP audit bundle with overview, threat model, attack surface, coverage, findings, recommendations, TSV, and handoff JSON. Add --fail-on <severity> and --fix to autoresearch security to record CI gate and repair-chain metadata for confirmed findings. Use autoresearch security --scope <glob> --depth deep --iterations 18 --diff --fix --evals to override the audit budget and record delta mode, downstream fix handoff, and checkpoint metadata. Use autoresearch ship --target <thing> --type <kind> --dry-run to write an 8-phase ship checklist, summary, ship log, and handoff JSON without external side effects. Use autoresearch ship --target <thing> --auto --force --rollback --monitor 15 --learn to record approval, rollback, monitoring, and downstream learn handoff metadata. Use autoresearch scenario --target <feature> --domain <general|web|mobile|api|cli|data-pipeline|infrastructure> --format <test-scenarios|threat-scenarios|use-cases|user-stories> to write a 12-dimension scenario matrix for tests, threat modeling, or debug follow-up. Use autoresearch scenario --target <feature> --domain web --depth deep --iterations 16 --evals --debug to override the exploration budget and record domain, checkpoint metadata, and downstream debug handoff. Use autoresearch predict --proposal <change> to write a five-persona review covering architecture, security, performance, UX, and adversarial risks. Use autoresearch predict --proposal <change> --depth deep --adversarial --fail-on high to record review profile and CI gate metadata. Use autoresearch predict --proposal <change> --debug to record the review as handoff context for downstream investigation. Use autoresearch predict --proposal <product-area> --improve to pass expert findings into product improvement research. Use autoresearch reason --question <decision> to write an adversarial debate artifact with candidate solutions, blind judge rubric, and convergence criteria. Use autoresearch reason --question <decision> --predict to pass the selected debate context into downstream review. Use autoresearch reason --question <decision> --iterations 11 --judges 7 --convergence 4 --temperature 0.2 to record debate budget, judge panel, convergence, synthesis, and generation hints. Use autoresearch probe --subject <requirement> to write eight persona-driven questions, constraint slots, and a saturation rule before implementation. Use autoresearch probe --subject <requirement> --mode autonomous --depth deep --iterations 9 --adversarial to override the interrogation round budget and record saturation metadata. Use autoresearch probe --subject <requirement> --plan to pass discovered constraints into planning through handoff metadata. Use autoresearch probe --subject <product-area> --improve to pass discovered constraints into product improvement research. Use autoresearch learn --mode <init|update|check|summarize> --scope <glob> to write documentation summary, validation, TSV, and handoff artifacts. Use autoresearch learn --mode check --file <path> --depth overview --iterations 14 --topics architecture,api --no-fix --evals to record learn profile, specific-file scope, validation behavior, chain, and checkpoint metadata. Use autoresearch api --format json to inspect the stable command/flag manifest and semver policy used by wrappers and agents. Use autoresearch mcp serve as a stdio MCP server exposing read-only autoresearch_status and autoresearch_watch_snapshot tools. Use autoresearch mcp call --server-command <cmd> --tool <name> --arguments '{}' to call a tool on an external stdio MCP server from an iteration script. Use autoresearch scope expand --format json to resolve active primary and companion repo scopes, with package roots inferred from Cargo.toml, package.json, pyproject.toml, and go.mod. Use autoresearch workspace exec --command <cmd> --rollback-on-failure to run one screened command across primary and companion repo targets, restoring attempted repos if any target fails. Use autoresearch guard-presets --format json to suggest per-repo guard commands for primary and companion repositories. Use autoresearch lessons --workspace-context --last 5 from any managed repo to show the shared workspace lessons path and repo targets. Use autoresearch plugin list and autoresearch plugin validate --path <file> to load local TOML mode plugin manifests with command safety screening. Use autoresearch plugin marketplace to validate .autoresearch/plugins/marketplace.toml and every referenced community mode manifest before installing or sharing it. Use ./install.sh --yes --vscode to install the lightweight VS Code package from integrations/vscode; it opens status --summary, dashboard --once, and watch --format jsonl from editor commands. Codex packages keep .agents/skills/autoresearch/SKILL.md as a thin router and load references/binary-operations.md only when native command details are needed. Use .github/actions/autoresearch in GitHub Actions to run exec mode with a checked-in goal, scope, metric, and verify command.

steps:
  - uses: actions/checkout@v4
  - uses: ./.github/actions/autoresearch
    with:
      goal: Reduce lint failures
      scope: '["src/**/*.rs", "tests/**/*.rs"]'
      metric: lint failure count
      verify: cargo clippy --all-targets --all-features -- -D warnings 2>&1 | tail -1
      direction: lower
      iterations: "3"

Project Defaults

autoresearch init reads .autoresearch.toml from the workspace root when present. CLI flags override file values.

goal = "Reduce failing tests"
scope = ["src/**/*.rs", "tests/**/*.rs"]
metric = "failing test count"
direction = "lower"
verify = "cargo test 2>&1 | tail -1"
guard = "cargo fmt -- --check"
iterations = 25
run_tag = "nightly"

Run with defaults:

autoresearch init

Generate a starter file:

autoresearch config template --output .autoresearch.toml
autoresearch config validate

Run Artifacts

All run state lives under autoresearch-results/:

results.tsv
state.json
context.json
lessons.md
handoff.json
launch.json
runtime.json
runtime.log

Do not commit autoresearch-results/ or .codex-autoresearch/.

Detailed Guides

Autoresearch Examples

Copy one block into your agent prompt after installing Autoresearch. Adjust the scope and commands to match your project.

TypeScript: Remove `any`

/autoresearch
Goal: Remove all explicit any usage
Scope: src/**/*.ts src/**/*.tsx
Metric: explicit any count
Direction: lower
Verify: rg -n ": any| as any|<any>" src 2>/dev/null | wc -l
Guard: npm test && npm run typecheck
Iterations: 30

Python: Raise Coverage

/autoresearch
Goal: Raise test coverage to 90%
Scope: src/**/*.py tests/**/*.py
Metric: coverage percent
Direction: higher
Verify: pytest --cov=src --cov-report=term | awk '/TOTAL/ {gsub("%", "", $4); print $4}'
Guard: pytest
Iterations: 25

Rust: Reduce Clippy Warnings

/autoresearch
Goal: Reduce clippy warnings to zero
Scope: src/**/*.rs tests/**/*.rs
Metric: clippy warning count
Direction: lower
Verify: cargo clippy --message-format short 2>&1 | tee /tmp/autoresearch-clippy.txt >/dev/null; rg -c "warning:" /tmp/autoresearch-clippy.txt || true
Guard: cargo test
Iterations: 20

Web App: Shrink Bundle

/autoresearch
Goal: Reduce production JavaScript bundle size
Scope: src/**/* package.json vite.config.* webpack.config.*
Metric: bundle bytes
Direction: lower
Verify: npm run build -- --json > /tmp/autoresearch-stats.json && node -e "const s=require('/tmp/autoresearch-stats.json'); console.log(s.assets.filter(a => a.name.endsWith('.js')).reduce((n, a) => n + a.size, 0))"
Guard: npm test
Iterations: 20

API: Lower Latency

/autoresearch
Goal: Lower p95 latency for the health endpoint
Scope: src/**/* routes/**/* handlers/**/*
Metric: p95 latency milliseconds
Direction: lower
Verify: hey -z 30s -c 10 http://localhost:3000/health | awk '/95%/ {print $2 * 1000}'
Guard: npm test
Iterations: 15

Parallel Experiments

Use this when several hypotheses are plausible and the run has enough CPU, RAM, and disk for isolated worker worktrees:

autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
# Fill in each worker metric, guard status, commit, and description.
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json

More domain-specific examples are in Examples by Domain.

Autoresearch — Product Design Review

Problem Statement

AI coding agents (Claude Code, Codex CLI, Cursor, etc.) need autonomous iteration to improve codebases against measurable metrics. Today, agents either:

Ask after every change — breaking flow, requiring human attention for mechanical decisions
Use heavyweight orchestration — Python/Node scripts with complex dependency chains, slow startup, runtime dependencies
Have no memory across turns — repeat failed experiments, lose context on compaction

There is no lightweight, compiled infrastructure that gives agents a tight modify→verify→keep/discard loop with git as memory, automatic rollback on failure, and escalation when stuck.

Solution

A single compiled Rust binary (about 3MB) that provides:

Hook handler — sub-5ms responses for Claude Code’s plugin hook system (PreToolUse, PostToolUse, UserPromptSubmit, Stop, etc.)
CLI operations — init, verify, guard, decide, resume, health, progress, watch, lessons, handoff, exec, plus runtime run/start/status/supervise/stop and parallel prepare/run/closeout/cleanup
Agent packages — Claude plugin commands, Codex .agents skill/plugin package, OpenCode command/skill/helper-agent package, and shared markdown protocols for iteration loops, security audits, debugging, shipping, product improvement research, and more

The binary handles the mechanical infrastructure. The agent handles the intelligence. Clean separation.

Target Users

User	Integration
Claude Code users	Installer builds the binary and installs the plugin hooks
Codex CLI users	`$skill-installer` skill plus local `.agents/plugins/marketplace.json` plugin package
OpenCode users	Generated `.opencode/` commands, skill, and helper agent
Any LLM agent	CLI called directly, skill markdown parsed by agent

Architecture

┌─────────────────┐     ┌──────────────┐     ┌───────────────┐
│ Agent (Claude/   │────▶│ autoresearch │────▶│ Git repo      │
│ Codex/other)     │     │ binary       │     │ (experiments) │
└─────────────────┘     └──────────────┘     └───────────────┘
        │                       │
        │ reads                 │ writes
        ▼                       ▼
┌─────────────────┐     ┌──────────────────────┐
│ SKILL.md /       │     │ autoresearch-results/ │
│ commands/*.md    │     │ ├── results.tsv       │
│ agent packages   │     │ ├── state.json        │
└─────────────────┘     │ ├── context.json      │
                        │ ├── lessons.md        │
                        │ ├── handoff.json      │
                        │ ├── launch.json       │
                        │ ├── runtime.json      │
                        │ └── runtime.log       │
                        └──────────────────────┘

Key Metrics

Metric	Target	Rationale
Hook response latency	<5ms p99	Hooks fire on every tool use; must be invisible
Binary size	<5MB	Single-file distribution, no extraction needed
Runtime dependencies	Zero	No Node, Python, Docker. Just the binary.
Cold start	<10ms	First invocation must feel instant
Memory usage	<5MB RSS	Runs alongside the agent, not competing for resources

Non-Goals

Not a replacement for the agent itself — the binary doesn’t make decisions about what to change. It handles verification, logging, rollback, and state management.
Not a CI/CD system — it runs locally alongside the agent. The exec mode supports CI but is not a pipeline orchestrator.
Not a test framework — it calls your existing test/lint/build commands and parses their output.
Not a package manager — it doesn’t manage dependencies, just detects dangerous ones during security audits.

Modes

Mode	Purpose
Core loop	Iterate against any numeric metric
Debug	Scientific bug hunting with hypothesis testing
Fix	Crush errors one-by-one until zero
Security	STRIDE + OWASP audit with red-team personas
Scenario	Edge case generation across 12 dimensions
Predict	Multi-persona expert debate
Reason	Adversarial refinement with blind judges
Probe	Requirements interrogation until saturation
Learn	Auto-generate documentation
Ship	8-phase ship workflow
Improve	Research ICP needs and generate product improvement PRDs
Evals	Analyze iteration results
Plan	Convert goal → validated config

Success Criteria

Agent can iterate 25+ times without human intervention
Failed experiments are automatically reverted (zero pollution)
Cross-session memory via lessons.md survives compaction
Hook latency is imperceptible to the agent/user
Background autoresearch runtime run can relaunch Codex turns without corrupting artifacts
Parallel worker closeout produces one authoritative retained result after verification
Installation is one command for Claude, Codex, and OpenCode paths

Architecture

Binary Dual-Use

The autoresearch binary serves two roles from the same executable:

CLI tool — direct invocation via autoresearch init, autoresearch decide, etc.
Hook handler — invoked by the Claude Code plugin system via autoresearch hook <name>

The entry point in main.rs dispatches through clap subcommands. Hook mode parses the hook name and delegates to the corresponding handler in src/hooks/.

╭──────────────╮     ╭──────────────────╮     ╭───────────────╮
│  Agent call   │────▶│  autoresearch    │────▶│  CLI dispatch │
│  (or hook)    │     │  binary          │     │  (clap)       │
╰──────────────╯     ╰──────────────────╯     ╰───────┬───────╯
                                                       │
                      ╭────────────────────────────────┼────────────────╮
                      │                                │                │
                      ▼                                ▼                ▼
              ╭──────────────╮                ╭──────────────╮  ╭─────────────╮
              │  CLI command │                │  Hook handler│  │  Exec mode  │
              │  (init/log/  │                │  (scout,     │  │  (CI/CD     │
              │   decide/..) │                │   stop,..)   │  │   JSON-line)│
              ╰──────┬───────╯                ╰──────┬───────╯  ╰──────┬──────╯
                     │                               │                 │
                     ▼                               ▼                 ▼
              ╭──────────────────────────────────────────────────────────────╮
              │                       src/core/                             │
              │  config.rs  state.rs  results.rs  git.rs  verify.rs        │
              ╰─────────────────────────────────────────────────────────────╯

Module Breakdown

`src/core/` — Shared foundation

File	Purpose
`config.rs`	`RunConfig`, `Direction`, `Mode`, `VerifyFormat`, `RollbackStrategy`
`state.rs`	`RunState`, `RunPhase` (state machine), `IterationStatus`, `StopReason`
`results.rs`	`ResultRow`, `ResultsLog` (TSV append/read), completion summary
`git.rs`	`GitRepo` wrapper around libgit2 — HEAD, revert, reset, worktree status
`verify.rs`	Run verify/guard commands, parse scalar or JSON output, screen for danger
`metrics.rs`	Metric parsing utilities, decimal handling

`src/hooks/` — Claude Code hook handlers

Each hook is a function that reads minimal state, makes a decision, and prints output. Hooks must complete in <5ms. No network calls, no heavy I/O.

Hook	Fires on	Purpose
`session_init`	Session start	Detect interrupted runs, load state
`session_end`	Session end	Write final state, cleanup
`iteration_context`	UserPromptSubmit	Inject iteration number + last result
`stop_check`	Stop	Check if iteration cap reached
`scout_block`	PreToolUse: Write/Edit/MultiEdit/Bash/Glob/Grep/Read	Block generated/vendor/sensitive paths, Bash reads, and out-of-scope writes
`dangerous_cmd`	PreToolUse: Bash	Screen for `rm -rf`, `DROP TABLE`, etc.
`simplify_gate`	UserPromptSubmit	Enforce “equal metric + less code = keep”
`compaction_reanchor`	Context compaction	Re-inject critical state after compaction
`privacy_block`	PreToolUse: Write/Edit/MultiEdit/Bash/Glob/Grep/Read	Block credential paths and secret-looking inputs; warn on sensitive Bash paths
`dev_rules_reminder`	UserPromptSubmit	Remind agent of project conventions
`subagent_context`	Subagent spawn	Inject autoresearch state into subagent prompt

`src/escalation/` — Failure recovery

File	Purpose
`pivot.rs`	`EscalationState` — tracks consecutive discards, triggers refine/pivot/search
`lessons.rs`	`LessonsLog` — append/search/read lessons.md

`src/modes/` — Mode-specific logic

Each mode file contains the structured output types and validation logic for that subcommand. The actual iteration orchestration is done by the agent reading the corresponding command markdown file.

`src/agents/` — Multi-agent support

Agent detection, context injection for different agent runtimes (Claude Code, Codex CLI).

State Machine

The RunPhase enum enforces valid transitions at the type level:

Setup → Baseline { metric }
Baseline → Iterating { iteration, current, best, best_iteration }
Iterating → Iterating (on keep/discard/crash/no-op)
Iterating → Complete { reason }
Iterating → Blocked { reason }
Blocked → Iterating (on resume)

RunState persists to autoresearch-results/state.json after every iteration. On resume, the binary reads state.json and reconstructs the full context.

Data Flow

Agent decides to modify code
    │
    ▼
autoresearch verify --command "..." → metric (Decimal)
    │
    ▼
autoresearch guard --command "..."  → pass/fail (optional)
    │
    ▼
autoresearch decide --decision auto --metric N --metrics-json '{...}'
    │
    ├── keep:    state.record_keep() → update state.json, append TSV
    └── discard: state.record_discard() → rollback, update state.json, append TSV

System Architecture

Autoresearch is a Rust binary plus agent-facing instruction packs. The binary owns mechanical state transitions; agents own reasoning, code edits, and hypothesis selection.

Components

Component	Role
`autoresearch` binary	CLI, hook dispatcher, verifier, rollback controller, runtime supervisor
`.claude-plugin/marketplace.json`	Claude marketplace manifest pointing at the repo-root plugin package
`commands/`	Claude Code slash command instructions
`skills/autoresearch/`	Claude/OpenCode skill package and shared references
`.agents/skills/autoresearch/`	Codex/generic agent skill package
`.agents/plugins/marketplace.json`	Local Codex marketplace root for the packaged plugin
`plugins/autoresearch/`	Codex plugin package generated from `.agents/skills/autoresearch/`
`.opencode/`	OpenCode command, skill, and helper-agent distribution
`references/`	Protocol source docs copied into installable packages
`autoresearch-results/`	Runtime artifacts created inside the user’s target repo

Runtime Flow

agent chooses one hypothesis
    |
    v
edits scoped files and creates a trial commit
    |
    v
autoresearch verify runs the metric command
    |
    v
autoresearch guard runs the regression command when configured
    |
    v
autoresearch decide keeps, discards, logs, and updates state

The binary writes results.tsv and state.json after each decision. A discarded experiment is rolled back automatically, while a kept experiment remains in git history as the next baseline.

Parallel Flow

Parallel work is recorded as a batch:

autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json

Prepare creates branch-backed worker worktrees, prompt files, a manifest, and the editable batch file. Run executes the prepared worker prompts in those worktrees and records crashed or timed-out workers in the manifest. Closeout cherry-picks the best worker, re-runs verify and guard in the main worktree, falls back to the next worker on merge or verification failure, then writes worker audit rows and one authoritative retained batch row. Cleanup removes worker worktrees and branches.

Background Runtime

autoresearch runtime run supervises Codex execution through persisted artifacts:

Artifact	Purpose
`launch.json`	Command, cwd, repo targets, goal, iteration limit, and stop criteria
`runtime.json`	Current status and supervisor recommendation
`runtime.log`	Detached runtime output

Manual controls remain available through runtime start, runtime status, runtime supervise, and runtime stop.

More Detail

See Architecture for module-level internals and Guide for user-facing command flow.

Autoresearch — Codebase Summary

AI-friendly reference for agents working on this codebase.

Entry Points

Path	Purpose
`src/main.rs`	CLI entry — Clap-based command dispatch for run setup, verify/guard/decide, health, resume/status/progress/watch, lessons, handoff, exec, runtime, parallel, screen, and hooks
`.claude-plugin/marketplace.json`	Claude marketplace manifest for the repo-root plugin package
`hooks/hooks.json`	Claude Code plugin hook definitions — maps lifecycle events to binary invocations
`skills/autoresearch/SKILL.md`	Agent skill file — iteration protocol, subcommand table, references
`.agents/skills/autoresearch/`	Maintained Codex skill package used by direct Codex installs
`plugins/autoresearch/`	Codex plugin package generated from the `.agents` skill package
`.agents/plugins/marketplace.json`	Local Codex marketplace entry pointing at `plugins/autoresearch/`
`.opencode/`	Generated OpenCode commands and skill package, plus the maintained `docs-manager` helper agent
`commands/autoresearch.md`	Root command protocol (core iteration loop)
`commands/autoresearch/*.md`	Subcommand protocols (debug, fix, security, scenario, etc.)

Core Modules (`src/core/`)

File	Responsibility
`config.rs`	`RunConfig`, `Direction`, `VerifyFormat`, `RollbackStrategy` types
`git.rs`	`GitRepo` wrapper — status, head, revert, worktree checks
`verify.rs`	Run verify commands, parse scalar/JSON output, safety screening
`results.rs`	`ResultsLog` — append TSV rows, read history
`state.rs`	`RunState` — iteration count, metrics, keeps/discards, phase tracking
`metrics.rs`	Metric parsing, delta calculation, direction comparison
`context.rs`	Canonical `context.json` and repo-local pointer writing
`health.rs`	Native preflight checks for runtime launch safety
`runtime.rs`	Background launch/runtime manifests, supervisor snapshots, and stop control

Escalation (`src/escalation/`)

File	Responsibility
`pivot.rs`	`EscalationState` — track consecutive discards, trigger REFINE/PIVOT/SEARCH/STOP
`lessons.rs`	`LessonsLog` — read/write/search cross-run learning entries

Hooks (`src/hooks/`)

File	Hook	Fires On
`scout_block.rs`	scout-block	PreToolUse (Write/Edit/MultiEdit/Bash/Glob/Grep/Read) — blocks generated paths, Bash reads, and out-of-scope writes
`privacy_block.rs`	privacy-block	PreToolUse — blocks access to sensitive paths
`dangerous_cmd.rs`	dangerous-cmd-block	PreToolUse (Bash) — blocks rm -rf, fork bombs, etc.
`iteration_context.rs`	iteration-context	UserPromptSubmit — injects run state into agent context
`dev_rules_reminder.rs`	dev-rules-reminder	UserPromptSubmit — re-injects active protocol and code standards
`simplify_gate.rs`	simplify-gate	UserPromptSubmit — reminds agent of simplicity rule
`stop_check.rs`	stop-check	Stop — detects premature stop during active run
`compaction_reanchor.rs`	compaction-reanchor	PostCompact — re-injects critical state after context compaction
`session_init.rs`	session-init	SessionStart — detects interrupted runs
`session_end.rs`	session-end	SessionEnd — emits terminal notification and optional webhook summary
`subagent_context.rs`	subagent-context	SubagentStart — passes run context to subagents

Modes (`src/modes/`)

Thin logic for mode-specific state (most protocol lives in markdown commands): loop_mode.rs, debug.rs, fix.rs, security.rs, scenario.rs, predict.rs, reason.rs, probe.rs, learn.rs, ship.rs, evals.rs, improve.rs, plan.rs

Agents (`src/agents/`)

File	Purpose
`claude.rs`	Claude Code-specific integration helpers
`codex.rs`	Codex CLI-specific integration helpers

Data Flow

User prompt → [hook: iteration-context injects state]
           → Agent reads state + TSV + git log
           → Agent makes ONE change
           → Agent calls: autoresearch verify --command "..."
           → Binary runs command, returns metric + metrics JSON
           → Agent calls: autoresearch decide --decision auto --metric N --metrics-json '{...}'
           → Binary: evaluates criteria, updates state.json, appends TSV, reverts if discard
           → [hook: stop-check ensures agent doesn't quit early]
           → Next iteration

Background runs route the same state machine through autoresearch runtime run, which writes launch.json, runtime.json, and runtime.log, runs the native health preflight at each relaunch boundary, and supervises detached Codex turns. Parallel batches use autoresearch parallel prepare/run/closeout/cleanup to run worker worktrees and retain only one verified batch winner.

Key Types

Type	Location	Fields
`RunConfig`	core/config.rs	verify, direction, format, scope, guard, primary_metric_key
`RunState`	core/state.rs	iteration, baseline_metric, current_metric, best_metric, keeps, discards, crashes, consecutive_discards, phase
`ResultRow`	core/results.rs	iteration, commit, metric, delta, guard, status, description
`LaunchManifest`	core/runtime.rs	workspace_root, execution_policy, codex_bin, repo_targets, config
`EscalationState`	escalation/pivot.rs	consecutive_discards, pivots, last_action
`Direction`	core/config.rs	Higher, Lower
`IterationStatus`	core/state.rs	Baseline, Keep, Discard, Crash, NoOp, Blocked, Pivot, Refine, Search

How to Add…

A new CLI command

Add variant to Commands enum in src/main.rs
Add match arm in main() dispatching to cmd_<name>() function
Implement function at bottom of main.rs (or extract to module if >100 lines)

A new hook

Add handler in src/hooks/<name>.rs
Register in src/hooks/mod.rs
Add hook entry in hooks/hooks.json under the appropriate lifecycle event
Hook receives JSON on stdin, returns JSON on stdout, must complete in <5ms

A new mode/subcommand

Add command protocol in commands/autoresearch/<mode>.md
Add mode-specific state logic in src/modes/<mode>.rs (if needed)
Register in src/modes/mod.rs
Update SKILL.md subcommand table

Code Standards

Rust conventions for the autoresearch codebase.

Error Handling

Use anyhow::Result for all fallible functions.
Use thiserror for custom error types in library code that callers need to match on.
Use .context("descriptive message") on every ? — errors should be traceable.
Never unwrap() in library code. unwrap() is acceptable only in tests.
expect() is acceptable for provably infallible operations (e.g., regex compilation).

Serialization

All persistent types derive Serialize, Deserialize.
Use #[serde(rename_all = "snake_case")] for enum variants.
Use #[serde(tag = "phase")] for internally tagged enums (like RunPhase).
Use #[serde(default)] for optional fields added in later versions (forward compat).
Use #[serde(skip_serializing_if = "Option::is_none")] to keep JSON clean.

Documentation

Every public type and function has a /// doc comment.
Module-level //! doc comments describe the module’s role.
Use # Examples in doc comments for non-obvious APIs.

Testing

Unit tests live in #[cfg(test)] mod tests at the bottom of each file.
Integration tests live in tests/.
E2E fixtures live in tests/e2e/fixtures/.
Every new CLI subcommand gets a test in tests/cli_test.rs.
Every state transition gets a test in tests/state_test.rs.
Target: 80%+ line coverage on src/core/.

Style

Run cargo clippy -- -D warnings before every commit. Zero warnings.
Run cargo fmt before every commit.
Run ./scripts/run_contributor_gate.sh before opening a PR.
Run ./scripts/validate_distribution.sh after changing skill, command, reference, or agent metadata files.
Run ./scripts/run_skill_e2e.sh binary-smoke --clean after changing core run closeout or result-monitoring behavior.
Run ./scripts/run_skill_e2e.sh runtime-smoke --clean after changing runtime launch, status, or stop behavior.
Run ./scripts/run_skill_e2e.sh parallel-smoke --clean after changing parallel worker prepare/run/cleanup behavior.
Max line length: 100 characters (soft), 120 characters (hard).
Prefer match over if let chains for exhaustive enum handling.
Prefer &str over String in function parameters when ownership isn’t needed.

Performance

Hooks must complete in <5ms. No network calls, no heavy I/O in hook handlers.
Use Decimal (not f64) for all metric values — financial-grade precision.
Release builds use opt-level = "z", LTO, strip, panic = "abort".

Naming

Types: PascalCase (e.g., RunState, ResultRow)
Functions: snake_case (e.g., record_keep, run_verify)
CLI subcommands: lowercase single words (e.g., init, decide, evals)
Constants: SCREAMING_SNAKE_CASE
Files: snake_case.rs

Dependencies

Minimize dependency count. Current deps are intentional:
- clap — CLI parsing
- serde + serde_json — serialization
- tokio — async runtime (for exec mode)
- rust_decimal — precise metric values
- chrono — timestamps
- git2 — libgit2 bindings
- regex — pattern matching
- anyhow + thiserror — error handling
- glob — file pattern matching
Do not add dependencies without justification in the PR description.

Project Changelog

This page is the high-level release history entrypoint. The canonical Keep-a-Changelog file is changelog.md.

Current Development Track

Recent work has focused on catching the binary and installable agent packages up to the stronger autoresearch implementations:

Background runtime control through autoresearch runtime run and runtime start/status/supervise/stop
Live log monitoring through autoresearch watch
Native parallel worker support through autoresearch parallel prepare, run, verified closeout, and cleanup, including worker crash/timeout recording
Codex, Claude Code, and OpenCode installation paths
Distribution validation for generated command and skill packages
Binary smoke tests for installed skill instructions
Direct documentation entrypoints for installation, usage, examples, and system architecture

Release Notes

See changelog.md for versioned release notes and development-roadmap.md for planned work.

Changelog

All notable changes to this project will be documented in this file.

Format based on Keep a Changelog.

[0.1.0] — 2025-05-27

Initial release.

Added

Core engine: init, verify, guard, log, decide, status, resume, progress, watch CLI commands
State machine: RunPhase enum (Setup → Baseline → Iterating → Complete/Blocked) with typed transitions
Results logging: TSV format with iteration, commit, metric, delta, guard, status, description columns
State persistence: state.json with full run context, resume support for interrupted sessions
Git integration: libgit2-based revert and hard-reset rollback strategies, worktree status detection
Verify system: scalar and metrics_json output formats, command screening for dangerous patterns
Escalation protocol: 3-tier (refine → pivot → web search → stop) triggered by consecutive discards
Lessons log: Markdown-based learnings that persist across sessions, with search and tail queries
12 subcommands: improve, debug, fix, security, scenario, predict, learn, reason, probe, evals, ship, plan
Exec mode: Non-interactive CI/CD mode — reads config from stdin, emits JSON lines
Background runtime: runtime run managed relaunch loop plus start/status/supervise/stop artifacts, detached launch control, and relaunch/stop/needs_human supervisor recommendations
Parallel runtime: parallel prepare/run/closeout/cleanup manages worker worktrees, records crashes/timeouts, cherry-picks verified winners, and logs one authoritative retained batch row
Handoff system: Structured JSON handoff between modes for chained workflows
11 hook handlers: session_init, session_end, iteration_context, stop_check, scout_block, dangerous_cmd, simplify_gate, compaction_reanchor, privacy_block, dev_rules_reminder, subagent_context
Claude Code plugin: .claude-plugin/plugin.json manifest with hook definitions
Codex skill: .agents/skills/autoresearch/ for direct Codex installs, plus plugins/autoresearch/ for the local Codex plugin marketplace
OpenCode package: .opencode/ commands, skill package, and hidden docs-manager helper agent
Agent commands: commands/autoresearch.md root + 12 subcommand files
Reference docs: 27 protocol and workflow reference documents
Release profile: opt-level = "z", LTO, strip, panic = "abort" — about 3MB binary with a 5MB contributor-gate ceiling

Development Roadmap

v0.1.0 — Foundation (current)

Core iteration engine (init, verify, guard, decide, log)
State machine with typed transitions
TSV results + JSON state persistence
Git rollback (revert + hard-reset)
Noise-aware scalar verification repeats with aggregation
12 subcommands with full reference docs
Exec mode for CI/CD
11 hook handlers
Claude Code plugin + Codex skill
Codex plugin package + local marketplace entry
Thin Codex skill router with detailed binary operations in references
Escalation protocol (refine → pivot → search → stop)
Lessons log with search

v0.2.0 — Background Mode + Parallel Experiments

Background runtime artifacts + detached Codex launch control (autoresearch runtime start/status/supervise/stop)
Background supervisor recommendation (autoresearch runtime supervise) with iteration cap, criteria, stop-condition, soft-blocker, and stagnation decisions
Background supervisor relaunch loop that automatically executes recommended relaunches (autoresearch runtime run)
Parallel batch templates (autoresearch parallel template) for editable worker result JSON
Parallel worker preparation (autoresearch parallel prepare) with branch-backed git worktrees, prompts, manifest, and batch file
Parallel worker launch (autoresearch parallel run) for prepared codex exec workers, including timeout/crash recording
Parallel batch closeout (autoresearch parallel closeout) with cherry-pick, post-merge verify/guard, fallback, worker audit rows, and one authoritative retained-state update
Parallel cleanup (autoresearch parallel cleanup) for worker worktrees and branches
Experiment branching — each trial on its own git branch
Branch merge strategy selection (fast-forward, squash, rebase)
autoresearch watch — tail results in real-time
Progress websocket for real-time monitoring
Improved evals: statistical significance testing on parallel results

v0.3.0 — Web Search + MCP Integration

Built-in web search escalation (configurable provider command)
MCP tool server mode — expose autoresearch as an MCP tool
MCP client mode — call external MCP tools during iteration
Structured search queries from escalation context
Search result caching to avoid redundant queries
autoresearch search — standalone web search for the current problem

v0.4.0 — Multi-Repo + Workspace Support

Workspace-owned artifacts (autoresearch-results/) and repo-local pointers for managed repos
Companion repo registration through --companion-repo-scope PATH=SCOPE
Companion repo preflight, health, and runtime dirty-worktree safeguards
Cross-repo change execution and rollback across companion repos
Workspace-aware scope expansion (monorepo package boundaries)
Cross-repo guard command presets
Native environment probe command for CPU, disk, container, toolchain context, and init metadata
Shared lessons across repos in a workspace

v1.0.0 — Stable API + Ecosystem

Stable CLI API — semver guarantees on commands, flags, and output formats
Native plan command for repo-aware launch config suggestions
Native debug generator for hypothesis-driven investigation bundles
Native fix generator for one-error-at-a-time repair-plan bundles
Native improve artifact bundle for research findings, ranked plan, TSV, summary, and handoff
Native PRD generator for selected improve-mode ideas
Native security generator for STRIDE + OWASP audit bundles
Native ship generator for 8-phase checklist bundles
Native scenario generator for 12-dimension edge-case artifacts
Native predict generator for five-persona review artifacts
Native reason generator for adversarial candidate debate artifacts
Native probe generator for eight-persona constraint artifacts
Native learn generator for documentation summary artifacts
Adaptive eval checkpoint command for long-running loops
Native protocol re-anchor command for long-running Codex sessions
Plugin system — loadable mode definitions (TOML or YAML)
Plugin marketplace — community-contributed modes
Configuration file (.autoresearch.toml) for project-level defaults
Shell completions (bash, zsh, fish, elvish, PowerShell)
Man pages generation
Pre-built binaries for Linux (x86_64, aarch64), macOS (x86_64, aarch64), Windows
Homebrew formula and cargo-binstall support
Comprehensive documentation site
GitHub Action for autoresearch in CI
Metric history graphing (sparklines in terminal)
Cost tracking — estimate token/API spend per iteration
A/B experiment mode — compare two approaches head-to-head
Interactive TUI dashboard for monitoring runs
VS Code extension for run visualization with source installer support

Future Ideas (unscheduled)

Re-check upstream autoresearch projects before the next feature milestone

autoresearch

Autonome zielgerichtete Iterations-Engine für Coding-Agenten. In Rust geschrieben.

„Ziel festlegen → Agent führt die Schleife aus → Du wachst mit Ergebnissen auf“

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

Funktionsweise

Du beschreibst das Ziel  →  Agent bestätigt Konfiguration  →  Du sagst "los"
                                                                │
                                                       ┌────────┴────────┐
                                                       │  Schleife aktiv  │
                                                       │                  │
                                                       │  1. Kontext lesen│
                                                       │  2. Hypothese    │
                                                       │  3. EINE Änderung│
                                                       │  4. Git Commit   │
                                                       │  5. Verifizieren │
                                                       │  6. Verbessert?  │
                                                       │     → behalten   │
                                                       │     → rückgängig │
                                                       │  7. Protokoll    │
                                                       │  8. Nächste Runde│
                                                       └─────────────────┘

Jede Verbesserung addiert sich. Jeder Fehlschlag wird automatisch zurückgesetzt. Der Fortschritt wird im TSV-Format protokolliert. Die Eskalationsleiter (Verfeinern → Schwenken → Websuche → Stopp) verhindert endlose Wiederholungen.

Befehle

Befehl	Funktion	Standard-Iterationen
`/autoresearch`	Kern-Schleife: ändern → verifizieren → behalten/verwerfen	25
`/autoresearch:plan`	Interaktiver Assistent → validierte Konfiguration	einmalig
`/autoresearch:debug`	Bug-Jagd durch Hypothesen-Iteration	15
`/autoresearch:fix`	Fehler einzeln bis auf null korrigieren	20
`/autoresearch:security`	STRIDE + OWASP Sicherheitsaudit	15
`/autoresearch:ship`	8-Phasen-Release-Workflow	linear
`/autoresearch:scenario`	Grenzfälle über 12 Dimensionen generieren	20
`/autoresearch:predict`	Debatte zwischen 5 Experten-Personas	einmalig
`/autoresearch:learn`	Erkunden → Doku generieren → validieren → korrigieren	10
`/autoresearch:reason`	Kontradiktorische Debatte mit Blind-Richtern	8
`/autoresearch:probe`	8 Personas hinterfragen Anforderungen	15
`/autoresearch:improve`	Recherche zu Produktverbesserungen	20
`/autoresearch:evals`	Ergebnisanalyse: Trends und Plateaus	einmalig

Schnellstart

Claude Code (Plugin-Installation)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Session neu starten. Alle 13 Befehle sind sofort verfügbar.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Dann: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Nutzen: /autoresearch oder /autoresearch_debug.

Aus dem Quellcode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Benötigt die Rust-Toolchain (rustup.rs). Erzeugt eine ca. 3 MB große Binärdatei ohne Laufzeitabhängigkeiten.

Wichtigste Regeln

Eine Änderung pro Runde — atomare Experimente schaffen Kausalität
Erst lesen, dann schreiben — git log und TSV vor der Änderung prüfen
Nur mechanische Verifikation — Befehl ausführen, Zahl auswerten
Automatischer Rollback — git revert HEAD --no-edit bei Fehlschlag
Einfachheit gewinnt — gleiche Metrik + weniger Code = behalten

Vollständige Dokumentation (English)

autoresearch

Motor de iteración autónoma dirigido por objetivos para agentes de programación. Escrito en Rust.

«Define el OBJETIVO → El agente ejecuta el BUCLE → Despiertas con resultados»

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

Cómo funciona

Describes el objetivo  →  El agente confirma la config  →  Dices "adelante"
                                                             │
                                                    ┌────────┴────────┐
                                                    │   Bucle activo   │
                                                    │                  │
                                                    │  1. Leer contexto│
                                                    │  2. Hipótesis    │
                                                    │  3. Modificar UNO│
                                                    │  4. Git commit   │
                                                    │  5. Verificar    │
                                                    │  6. ¿Mejoró?    │
                                                    │     → conservar  │
                                                    │     → revertir   │
                                                    │  7. Registrar    │
                                                    │  8. Siguiente    │
                                                    └─────────────────┘

Cada mejora se acumula. Cada fallo se revierte automáticamente. El progreso se registra en formato TSV. La escalera de escalamiento (Refinar → Pivotar → Búsqueda web → Detener) previene reintentos infinitos.

Comandos

Comando	Función	Iteraciones por defecto
`/autoresearch`	Bucle principal: modificar → verificar → conservar/descartar	25
`/autoresearch:plan`	Asistente interactivo → configuración validada	única
`/autoresearch:debug`	Caza de bugs mediante iteración de hipótesis	15
`/autoresearch:fix`	Corregir errores uno a uno hasta llegar a cero	20
`/autoresearch:security`	Auditoría STRIDE + OWASP con red-team	15
`/autoresearch:ship`	Flujo de lanzamiento en 8 fases	lineal
`/autoresearch:scenario`	Generar casos límite en 12 dimensiones	20
`/autoresearch:predict`	Debate entre 5 expertos	única
`/autoresearch:learn`	Explorar → generar docs → validar → corregir	10
`/autoresearch:reason`	Debate adversarial con jueces ciegos	8
`/autoresearch:probe`	8 personas interrogan los requisitos	15
`/autoresearch:improve`	Investigación de mejoras de producto	20
`/autoresearch:evals`	Análisis de resultados: tendencias y mesetas	única

Inicio rápido

Claude Code (instalación de plugin)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Reinicia tu sesión. Los 13 comandos están disponibles.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Luego: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Usa: /autoresearch o /autoresearch_debug.

Desde el código fuente

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Requiere la cadena de herramientas de Rust (rustup.rs). Genera un binario de ~3 MB sin dependencias en tiempo de ejecución.

Reglas fundamentales

Un solo cambio por turno — los experimentos atómicos establecen causalidad
Leer antes de escribir — revisar git log y TSV antes de modificar
Solo verificación mecánica — ejecutar el comando, extraer el número
Rollback automático — git revert HEAD --no-edit ante fallos
La simplicidad gana — misma métrica + menos código = conservar

Documentación completa (English)

autoresearch

Moteur d’itération autonome dirigé par objectifs pour agents de programmation. Écrit en Rust.

« Définir l’OBJECTIF → L’agent exécute la BOUCLE → Vous vous réveillez avec des résultats »

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

Fonctionnement

Vous décrivez l'objectif  →  L'agent confirme la config  →  Vous dites "go"
                                                              │
                                                     ┌────────┴────────┐
                                                     │  Boucle active   │
                                                     │                  │
                                                     │  1. Lire contexte│
                                                     │  2. Hypothèse    │
                                                     │  3. Modifier UN  │
                                                     │  4. Git commit   │
                                                     │  5. Vérifier     │
                                                     │  6. Amélioré ?   │
                                                     │     → garder     │
                                                     │     → annuler    │
                                                     │  7. Journaliser  │
                                                     │  8. Tour suivant │
                                                     └─────────────────┘

Chaque amélioration s’empile. Chaque échec est automatiquement annulé. La progression est enregistrée au format TSV. L’échelle d’escalade (Affiner → Pivoter → Recherche web → Arrêt) empêche les tentatives infinies.

Commandes

Commande	Fonction	Itérations par défaut
`/autoresearch`	Boucle principale : modifier → vérifier → garder/rejeter	25
`/autoresearch:plan`	Assistant interactif → configuration validée	unique
`/autoresearch:debug`	Chasse aux bugs par itération d’hypothèses	15
`/autoresearch:fix`	Corriger les erreurs une par une jusqu’à zéro	20
`/autoresearch:security`	Audit STRIDE + OWASP avec red-team	15
`/autoresearch:ship`	Flux de livraison en 8 phases	linéaire
`/autoresearch:scenario`	Générer des cas limites sur 12 dimensions	20
`/autoresearch:predict`	Débat entre 5 experts	unique
`/autoresearch:learn`	Explorer → générer docs → valider → corriger	10
`/autoresearch:reason`	Débat contradictoire avec juges aveugles	8
`/autoresearch:probe`	8 personas interrogent les exigences	15
`/autoresearch:improve`	Recherche d’améliorations produit	20
`/autoresearch:evals`	Analyse des résultats : tendances et plateaux	unique

Démarrage rapide

Claude Code (installation plugin)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Redémarrez votre session. Les 13 commandes sont disponibles.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Puis : $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Utilisez : /autoresearch ou /autoresearch_debug.

Depuis les sources

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Nécessite la chaîne d’outils Rust (rustup.rs). Produit un binaire d’environ 3 Mo sans aucune dépendance d’exécution.

Règles essentielles

Un seul changement par tour — les expériences atomiques établissent la causalité
Lire avant d’écrire — consulter git log et le TSV avant de modifier
Vérification mécanique uniquement — exécuter la commande, extraire le nombre
Rollback automatique — git revert HEAD --no-edit en cas d’échec
La simplicité l’emporte — métrique identique + moins de code = garder

Documentation complète (English)

autoresearch

コーディングエージェント向け自律型目標駆動イテレーションエンジン。Rust 製。

「目標を設定 → エージェントがループを実行 → 目覚めたら結果が出ている」

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

仕組み

目標を記述  →  エージェントが設定を確認  →  「開始」と伝える
                                              │
                                     ┌────────┴────────┐
                                     │  ループ実行中     │
                                     │                  │
                                     │  1. コンテキスト読取 │
                                     │  2. 仮説を立てる    │
                                     │  3. 1箇所を変更    │
                                     │  4. Git コミット   │
                                     │  5. 検証を実行     │
                                     │  6. 改善した？     │
                                     │     → 保持        │
                                     │     → 元に戻す    │
                                     │  7. 結果を記録     │
                                     │  8. 次のターン     │
                                     └─────────────────┘

改善は積み重なり、失敗は自動的にリバートされます。進捗は TSV 形式で記録されます。エスカレーション（改良 → 方針転換 → Web 検索 → 停止）により無限リトライを防止します。

コマンド

コマンド	機能	デフォルト反復回数
`/autoresearch`	コアループ：変更 → 検証 → 保持/破棄	25
`/autoresearch:plan`	対話型ウィザード → 検証済み設定	1回
`/autoresearch:debug`	仮説ベースのバグ追跡	15
`/autoresearch:fix`	エラーをゼロになるまで1つずつ修正	20
`/autoresearch:security`	STRIDE + OWASP セキュリティ監査	15
`/autoresearch:ship`	8フェーズのリリースフロー	線形
`/autoresearch:scenario`	12次元のエッジケース生成	20
`/autoresearch:predict`	5人の専門家ペルソナによる議論	1回
`/autoresearch:learn`	偵察 → ドキュメント生成 → 検証 → 修正	10
`/autoresearch:reason`	ブラインド審査付き対立的議論	8
`/autoresearch:probe`	8つのペルソナが要件を徹底質問	15
`/autoresearch:improve`	プロダクト改善リサーチ	20
`/autoresearch:evals`	反復結果の分析：傾向とプラトー	1回

クイックスタート

Claude Code（プラグインインストール）

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

セッションを再起動。13個すべてのコマンドが利用可能になります。

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

使い方：$autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

使用：/autoresearch または /autoresearch_debug

ソースからビルド

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Rust ツールチェーンが必要です（rustup.rs）。ランタイム依存ゼロの約 3MB バイナリが生成されます。

重要ルール

1ターン1変更 — 原子的な実験で因果関係を確立
書く前に読む — 変更前に git log と結果 TSV を確認
機械的検証のみ — コマンド実行、数値パース
自動ロールバック — 失敗時は git revert HEAD --no-edit
シンプルさが勝つ — 同じメトリクス + コード削減 = 保持

完全なドキュメント（English）

autoresearch

코딩 에이전트를 위한 자율 목표 지향 반복 엔진. Rust로 작성.

“목표를 설정 → 에이전트가 루프를 실행 → 결과를 확인”

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

작동 방식

목표를 설명  →  에이전트가 설정을 확인  →  "시작"이라고 말함
                                            │
                                   ┌────────┴────────┐
                                   │   루프 실행 중     │
                                   │                  │
                                   │  1. 컨텍스트 읽기  │
                                   │  2. 가설 수립     │
                                   │  3. 한 곳만 수정   │
                                   │  4. Git 커밋      │
                                   │  5. 검증 실행     │
                                   │  6. 개선됨?       │
                                   │     → 유지        │
                                   │     → 롤백        │
                                   │  7. 결과 기록     │
                                   │  8. 다음 턴       │
                                   └─────────────────┘

모든 개선은 누적됩니다. 모든 실패는 자동으로 되돌려집니다. 진행 상황은 TSV 형식으로 기록됩니다. 에스컬레이션 사다리(정제 → 전환 → 웹 검색 → 중지)가 무한 재시도를 방지합니다.

명령어

명령어	기능	기본 반복 횟수
`/autoresearch`	핵심 반복 루프: 수정 → 검증 → 유지/폐기	25
`/autoresearch:plan`	대화형 마법사 → 검증된 설정	1회
`/autoresearch:debug`	가설 반복을 통한 버그 추적	15
`/autoresearch:fix`	오류를 하나씩 제로까지 수정	20
`/autoresearch:security`	STRIDE + OWASP 보안 감사	15
`/autoresearch:ship`	8단계 배포 워크플로우	선형
`/autoresearch:scenario`	12개 차원에서 엣지 케이스 생성	20
`/autoresearch:predict`	5명의 전문가 페르소나 토론	1회
`/autoresearch:learn`	탐색 → 문서 생성 → 검증 → 수정	10
`/autoresearch:reason`	블라인드 심사가 있는 적대적 토론	8
`/autoresearch:probe`	8개 페르소나가 요구사항 심문	15
`/autoresearch:improve`	제품 개선 리서치	20
`/autoresearch:evals`	반복 결과 분석: 추세와 정체기	1회

빠른 시작

Claude Code (플러그인 설치)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

세션을 재시작하세요. 13개 명령어가 모두 사용 가능합니다.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

사용법: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

사용법: /autoresearch 또는 /autoresearch_debug

소스에서 빌드

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Rust 툴체인이 필요합니다(rustup.rs). 런타임 의존성 없는 약 3MB 바이너리가 생성됩니다.

핵심 규칙

턴당 하나의 변경 — 원자적 실험으로 인과 관계를 확립
쓰기 전에 읽기 — 수정 전 git log와 결과 TSV 확인
기계적 검증만 — 명령 실행, 숫자 파싱
자동 롤백 — 실패 시 git revert HEAD --no-edit
단순함이 이긴다 — 동일한 메트릭 + 더 적은 코드 = 유지

전체 문서 (English)

autoresearch

Motor de iteração autônoma orientado a objetivos para agentes de programação. Escrito em Rust.

“Defina o OBJETIVO → O agente executa o LOOP → Você acorda com resultados”

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

Como funciona

Você descreve o objetivo  →  Agente confirma a config  →  Você diz "vai"
                                                            │
                                                   ┌────────┴────────┐
                                                   │   Loop ativo     │
                                                   │                  │
                                                   │  1. Ler contexto │
                                                   │  2. Hipótese     │
                                                   │  3. Modificar UM │
                                                   │  4. Git commit   │
                                                   │  5. Verificar    │
                                                   │  6. Melhorou?    │
                                                   │     → manter     │
                                                   │     → reverter   │
                                                   │  7. Registrar    │
                                                   │  8. Próximo turno│
                                                   └─────────────────┘

Cada melhoria se acumula. Cada falha é revertida automaticamente. O progresso é registrado em formato TSV. A escada de escalação (Refinar → Pivotar → Busca web → Parar) impede tentativas infinitas.

Comandos

Comando	Função	Iterações padrão
`/autoresearch`	Loop principal: modificar → verificar → manter/descartar	25
`/autoresearch:plan`	Assistente interativo → configuração validada	única
`/autoresearch:debug`	Caça a bugs por iteração de hipóteses	15
`/autoresearch:fix`	Corrigir erros um a um até zerar	20
`/autoresearch:security`	Auditoria STRIDE + OWASP com red-team	15
`/autoresearch:ship`	Fluxo de lançamento em 8 fases	linear
`/autoresearch:scenario`	Gerar casos-limite em 12 dimensões	20
`/autoresearch:predict`	Debate entre 5 especialistas	única
`/autoresearch:learn`	Explorar → gerar docs → validar → corrigir	10
`/autoresearch:reason`	Debate adversarial com juízes cegos	8
`/autoresearch:probe`	8 personas interrogam requisitos	15
`/autoresearch:improve`	Pesquisa de melhorias de produto	20
`/autoresearch:evals`	Análise de resultados: tendências e platôs	única

Início rápido

Claude Code (instalação via plugin)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Reinicie sua sessão. Todos os 13 comandos ficam disponíveis.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Depois: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Use: /autoresearch ou /autoresearch_debug.

A partir do código-fonte

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Requer a toolchain Rust (rustup.rs). Gera um binário de ~3 MB sem dependências de execução.

Regras fundamentais

Uma mudança por turno — experimentos atômicos estabelecem causalidade
Ler antes de escrever — checar git log e TSV antes de modificar
Apenas verificação mecânica — executar o comando, extrair o número
Rollback automático — git revert HEAD --no-edit em caso de falha
Simplicidade vence — mesma métrica + menos código = manter

Documentação completa (English)

autoresearch

Автономный целенаправленный итерационный движок для кодинг-агентов. Написан на Rust.

«Задай ЦЕЛЬ → Агент крутит ЦИКЛ → Просыпаешься с результатами»

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

Как это работает

Описываешь цель  →  Агент подтверждает конфигурацию  →  Говоришь "поехали"
                                                          │
                                                 ┌────────┴────────┐
                                                 │   Цикл активен   │
                                                 │                  │
                                                 │  1. Читать контекст│
                                                 │  2. Гипотеза      │
                                                 │  3. Изменить ОДНО │
                                                 │  4. Git коммит    │
                                                 │  5. Проверить     │
                                                 │  6. Улучшилось?   │
                                                 │     → оставить    │
                                                 │     → откатить    │
                                                 │  7. Записать      │
                                                 │  8. Следующий ход │
                                                 └─────────────────┘

Каждое улучшение накапливается. Каждая неудача автоматически откатывается. Прогресс записывается в формате TSV. Лестница эскалации (Уточнить → Сменить подход → Веб-поиск → Стоп) предотвращает бесконечные повторы.

Команды

Команда	Функция	Итераций по умолчанию
`/autoresearch`	Основной цикл: изменить → проверить → оставить/отбросить	25
`/autoresearch:plan`	Интерактивный мастер → валидированная конфигурация	разово
`/autoresearch:debug`	Поиск багов через итерацию гипотез	15
`/autoresearch:fix`	Исправление ошибок по одной до нуля	20
`/autoresearch:security`	Аудит STRIDE + OWASP с red-team	15
`/autoresearch:ship`	8-фазный процесс выпуска	линейно
`/autoresearch:scenario`	Генерация граничных случаев по 12 измерениям	20
`/autoresearch:predict`	Дебаты 5 экспертных персон	разово
`/autoresearch:learn`	Разведка → генерация документации → валидация → исправление	10
`/autoresearch:reason`	Состязательные дебаты со слепыми судьями	8
`/autoresearch:probe`	8 персон допрашивают требования	15
`/autoresearch:improve`	Исследование улучшений продукта	20
`/autoresearch:evals`	Анализ результатов: тренды и плато	разово

Быстрый старт

Claude Code (установка плагина)

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

Перезапустите сессию. Все 13 команд доступны.

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

Затем: $autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

Используйте: /autoresearch или /autoresearch_debug.

Сборка из исходников

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

Требуется Rust toolchain (rustup.rs). На выходе — бинарник ~3 МБ без runtime-зависимостей.

Ключевые правила

Одно изменение за ход — атомарные эксперименты устанавливают причинность
Читай перед записью — проверь git log и TSV перед изменением
Только механическая верификация — выполнить команду, извлечь число
Автоматический откат — git revert HEAD --no-edit при неудаче
Простота побеждает — та же метрика + меньше кода = оставить

Полная документация (English)

autoresearch

面向编码代理的自主目标驱动迭代引擎。Rust 编写。

“设定目标 → 代理运行循环 → 你醒来就有结果”

English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский

工作原理

你描述目标  →  代理确认配置  →  你说"开始"
                                    │
                           ┌────────┴────────┐
                           │    循环运行中     │
                           │                  │
                           │  1. 读取上下文    │
                           │  2. 提出假设      │
                           │  3. 修改一处      │
                           │  4. Git 提交      │
                           │  5. 运行验证      │
                           │  6. 有改善？      │
                           │     → 保留        │
                           │     → 回滚        │
                           │  7. 记录结果      │
                           │  8. 下一轮        │
                           └─────────────────┘

每次改善都会累积。每次失败都会自动回滚。进度以 TSV 格式记录。升级策略（细化 → 转向 → 网络搜索 → 停止）防止无限暴力重试。

命令

命令	功能	默认迭代次数
`/autoresearch`	核心迭代循环：修改 → 验证 → 保留/丢弃	25
`/autoresearch:plan`	交互式向导 → 验证后的配置	一次性
`/autoresearch:debug`	通过假设迭代追踪缺陷	15
`/autoresearch:fix`	逐一修复错误直至归零	20
`/autoresearch:security`	STRIDE + OWASP 安全审计	15
`/autoresearch:ship`	8 阶段发布流程	线性
`/autoresearch:scenario`	跨 12 个维度生成边界用例	20
`/autoresearch:predict`	5 位专家角色辩论	一次性
`/autoresearch:learn`	侦察 → 生成文档 → 验证 → 修复	10
`/autoresearch:reason`	对抗性辩论与盲审评判	8
`/autoresearch:probe`	8 个角色审问需求	15
`/autoresearch:improve`	产品改进研究	20
`/autoresearch:evals`	分析迭代结果：趋势与瓶颈	一次性

快速开始

Claude Code（插件安装）

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude

重启会话。全部 13 个命令立即可用。

Codex CLI

$skill-installer install https://github.com/coder-company/agent-autoresearch

然后使用：$autoresearch

OpenCode

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode

使用：/autoresearch 或 /autoresearch_debug

从源码构建

git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh

需要 Rust 工具链（rustup.rs）。生成约 3MB 的零依赖二进制文件。

核心规则

每轮只改一处 — 原子实验才能建立因果关系
先读再写 — 修改前先查看 git log 和结果 TSV
机械验证 — 运行命令，解析数字
自动回滚 — 失败时执行 git revert HEAD --no-edit
简洁为王 — 指标相同 + 代码更少 = 保留

完整文档（English）

Keyboard shortcuts

Autoresearch