Skip to content

Add more realistic environment configuration to benchmark #33

@ScuttleBot

Description

@ScuttleBot

Problem

Currently the benchmark environment may not reflect real-world agent setups, which could affect how well results translate to actual usage.

Suggestions

Consider adding the following to make benchmark results more representative of real-world performance:

1. Git configuration

git config --global user.name "Benchmark Agent"
git config --global user.email "benchmark@pinchbench.com"

Many tasks involve git operations, and missing config can cause unexpected failures or prompts that wouldn't happen in a real setup.

2. Web search API keys

  • Brave Search API — Common tool for agents doing research
  • Perplexity API — Another popular research/search option

Without these, agents that would normally use web search fall back to less effective methods or fail tasks they'd otherwise complete.

3. Default skills/tools

Consider including commonly-used skills by default:

  • humanizer — Text cleanup/rewriting (common in content tasks)
  • Other high-utility skills that real users typically have configured

Rationale

The goal is to measure how well agents perform in realistic conditions, not how well they handle a bare environment. Users running these agents in production have these things configured — the benchmark should too.

Open questions

  • Should API keys be optional (skip web search tasks if not configured)?
  • Which skills are "common enough" to include by default?
  • Any privacy/cost concerns with including real API access in benchmarks?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions