Skip to content

Targets Configuration

Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.

targets:
- name: azure_base
provider: azure
endpoint: ${{ AZURE_OPENAI_ENDPOINT }}
api_key: ${{ AZURE_OPENAI_API_KEY }}
model: ${{ AZURE_DEPLOYMENT_NAME }}
- name: vscode_dev
provider: vscode
workspace_template: ${{ WORKSPACE_PATH }}
judge_target: azure_base
- name: local_agent
provider: cli
command_template: 'python agent.py --prompt {PROMPT}'
judge_target: azure_base

Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:

targets:
- name: my_target
provider: anthropic
api_key: ${{ ANTHROPIC_API_KEY }}
model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files.

ProviderTypeDescription
azureLLMAzure OpenAI
anthropicLLMAnthropic Claude API
geminiLLMGoogle Gemini
claudeAgentClaude Agent SDK
codexAgentCodex CLI
pi-coding-agentAgentPi Coding Agent
vscodeAgentVS Code with Copilot
vscode-insidersAgentVS Code Insiders
cliAgentAny CLI command
mockTestingMock provider for dry runs

Set the default target at the top level or override per case:

# Top-level default
execution:
target: azure_base
tests:
- id: test-1
# Uses azure_base
- id: test-2
execution:
target: vscode_dev # Override for this case

Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:

targets:
- name: codex_target
provider: codex
judge_target: azure_base # LLM used for judging

For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.

targets:
- name: claude_agent
provider: claude
workspace_template: ./workspace-templates/my-project
judge_target: azure_base

When workspace_template is set:

  • The template directory is copied to ~/.agentv/workspaces/<eval-run-id>/<test-id>/
  • The .git directory is skipped during copy
  • Each test gets its own isolated copy

Run scripts before and after each test using the workspace block. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).

workspace:
template: ./workspace-templates/my-project
setup:
script: ["bun", "run", "setup.ts"]
timeout_ms: 120000
cwd: ./scripts
teardown:
script: ["bun", "run", "teardown.ts"]
timeout_ms: 30000
FieldDescription
templateDirectory to copy as workspace (alternative to target-level workspace_template)
setupScript to run after workspace creation, before the agent runs
teardownScript to run after evaluation, before cleanup

Each script config accepts:

FieldDescription
scriptCommand array (e.g., ["bun", "run", "setup.ts"])
timeout_msTimeout in milliseconds (default: 60000 for setup, 30000 for teardown)
cwdWorking directory (relative paths resolved against eval file directory)

Lifecycle order: template copy → setup script → git baseline → agent runs → file changes captured → teardown script → cleanup

Error handling:

  • Setup failure aborts the test with an error result
  • Teardown failure is non-fatal (warning only)

Script context: Both scripts receive a JSON object on stdin with case context:

{
"workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
"test_id": "case-01",
"eval_run_id": "run-123",
"case_input": "Fix the bug",
"case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }
}

Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.

After setup and git baseline initialization, AgentV computes a SHA-256 fingerprint of the workspace file tree. This fingerprint is included in the evaluation result as workspaceFingerprint and can be used to verify that workspaces are reproducible across runs.

By default:

  • Success: Workspace is cleaned up automatically
  • Failure: Workspace is preserved for debugging

Override with CLI flags:

  • --keep-workspaces: Always preserve workspaces
  • --cleanup-workspaces: Always clean up, even on failure
OptionUse Case
cwdRun in an existing directory (shared across tests)
workspace_templateCopy template to temp location (isolated per case)

These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.