feat(discovery): configurable RPC endpoint for agent discovery

### Problem

Agent discovery in `getRegisteredAgentsByEvents()` (and other functions in `erc8004.ts`) uses viem's default RPC endpoint for Base, which resolves to the public RPC at `mainnet.base.org`. This public RPC has reliability issues that block all agent discovery with no fallback:

**Observed Feb 26:** `mainnet.base.org` returned HTTP 503 ("no backend is currently healthy to serve traffic") for an extended period. `eth_blockNumber` queries worked, but `eth_getLogs` was dead. Every agent running on the Automaton framework lost the ability to discover other agents. There was no way for operators to configure an alternative RPC endpoint.

### Impact

- **Complete discovery outage** when Base public RPC has degraded service
- **No operator workaround** — the RPC URL is determined by viem's chain defaults inside `erc8004.ts`
- **Affects all operators equally** — everyone depends on the same hardcoded default
- **$10+ in wasted agent credits** observed from a single outage (agents loop on failed discovery)

### Proposed Solution

Allow operators to configure a custom RPC URL for agent discovery. This would let operators use their own RPC provider (Alchemy, QuickNode, BlastAPI, etc.) instead of relying on the public endpoint.

**Design considerations for the maintainers:**

| Approach | Pros | Cons |
|----------|------|------|
| Environment variable (`AUTOMATON_RPC_URL`) | Simple, zero schema changes, operators set and forget | May not match Conway's config patterns |
| `automaton.json` config field | Consistent with existing config pattern (if applicable) | Requires schema changes |
| Function parameter (`rpcUrl?: string`) | Explicit, no global state, caller controls | Changes exported function signatures |
| Fallback RPC list with automatic retry | Full resilience, automatic failover | Complex retry logic, over-engineered for initial implementation |

We intentionally kept this as an issue rather than a PR because the design decision depends on Conway's configuration patterns and architectural preferences. Happy to implement whichever approach the team prefers.

### Our Production Workaround (Reference)

We locally patched all three `createPublicClient` calls in `erc8004.ts` to use BlastAPI's free tier:
```typescript
transport: http("https://base-mainnet.public.blastapi.io")
```
This immediately resolved the outage. BlastAPI has been reliable, but hardcoding a specific provider isn't the right upstream solution.

### Related

- PR #228 — paginated block scanning (introduced the scanning infrastructure)
- PR #239 — timeout tuning (adjusts `PER_CHUNK_TIMEOUT_MS` and `MAX_CONSECUTIVE_FAILURES` - addresses RPC *latency*, not *outages*)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(discovery): configurable RPC endpoint for agent discovery #240

Problem

Impact

Proposed Solution

Our Production Workaround (Reference)

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	Pros	Cons
Environment variable (`AUTOMATON_RPC_URL`)	Simple, zero schema changes, operators set and forget	May not match Conway's config patterns
`automaton.json` config field	Consistent with existing config pattern (if applicable)	Requires schema changes
Function parameter (`rpcUrl?: string`)	Explicit, no global state, caller controls	Changes exported function signatures
Fallback RPC list with automatic retry	Full resilience, automatic failover	Complex retry logic, over-engineered for initial implementation

feat(discovery): configurable RPC endpoint for agent discovery #240

Description

Problem

Impact

Proposed Solution

Our Production Workaround (Reference)

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions