Skip to content

Three new debug/ops commands: debug decode, chains validate, debug proof-verify#26

Open
valera-grinenko-ai wants to merge 7 commits intomm-zk-codex:mainfrom
valera-grinenko-ai:features/decode-and-chains-validate
Open

Three new debug/ops commands: debug decode, chains validate, debug proof-verify#26
valera-grinenko-ai wants to merge 7 commits intomm-zk-codex:mainfrom
valera-grinenko-ai:features/decode-and-chains-validate

Conversation

@valera-grinenko-ai
Copy link
Copy Markdown

Three new debug/ops commands + test suite + docs

This PR adds three new commands that cover gaps in the current interop debugging workflow, a unit test suite (25 tests), and a documentation pass that explains why each feature matters — not just how to use it.


New commands

debug decode <hex>

Decodes raw calldata or revert data offline against all known interop function selectors (7) and error selectors (18). No RPC required.

Why it matters: When bundle execute fails, the RPC returns a raw hex revert reason. Without tooling, recovering the error name requires computing keccak256 of each candidate signature, checking if it matches the first 4 bytes, then ABI-decoding the rest manually. debug decode does all of this in one command and covers every interop error type, including ones that are tricky to decode by hand (e.g. UnauthorizedMessageSender with two address params). Works on calldata too — useful for inspecting bundle files or bundle explain call inputs offline.

chains validate [alias]

For each configured chain alias, checks: RPC reachability, stored chainId matches the live RPC, and zks_getL2ToL1LogProof / zks_getL1BatchNumber are supported.

Why it matters: A stale chainId in the config (e.g. after a testnet reset) causes every relay attempt to fail with WrongDestinationChainId — a revert that looks like a protocol or bundle problem, not a config problem. The auto-relayer is especially sensitive: a single misconfigured chain means every bundle it touches fails silently. chains validate catches this before anything runs.

debug proof-verify <proof.json> [--bundle <hex>] [--dest-chain <alias>]

Verifies a MessageInclusionProof cryptographically without spending gas. Reconstructs the L2 log leaf hash from the proof's message fields (as in MessageHashing.sol), walks the Merkle tree in Rust, and compares the computed root to the value in the proof JSON. With --dest-chain, also queries interopRoots(chainId, batchNumber) live to confirm the root is present on the destination chain.

Produces a three-way verdict:

  • VALID — proof is mathematically correct and the root is on-chain; safe to call bundle execute
  • NOT YET READY — proof is mathematically correct but the root hasn't propagated yet; wait and retry
  • INVALID — the Merkle math fails; re-fetch the proof

Why it matters: bundle explain (via eth_call) cannot distinguish these three cases — it reverts with MessageNotIncluded whether the proof has bad data or just needs more time. proof-verify resolves that ambiguity offline, before spending gas, and before the root is even required to be present. Handles both old (plain Merkle path) and new (metadata-header) proof formats. Supports --verbose to trace every Merkle step.


Test suite

Added 25 unit tests in src/abi.rs:

  • 14 tests for decode_calldata_bytes covering unknown selectors, all function call round-trips, and all error decode paths
  • 3 round-trip tests for verifyBundle, executeBundle, sendBundle encode→decode
  • 8 tests for compute_leaf_hash, compute_merkle_root, and verify_proof_offline

Also fixed a bug uncovered by tests: SolCall::abi_decode expects the full calldata including the 4-byte selector; the decode function was incorrectly stripping it first.


Documentation

  • Troubleshooting section (README): rewritten as a symptom-driven guide — "got a raw revert selector", "don't know if proof failed or just needs more time", "relay fails after reconfiguration"
  • chains validate: added "when to use this" block; auto-relay README has a Prerequisites section explaining why validation matters before starting a continuous relay loop
  • debug decode (retry example): shown at the expected-to-fail step with commentary on why manual decoding is tedious
  • debug proof-verify (debug basics example): inserted as Step 5b with a "Why not bundle explain?" comparison

Misc

  • Consistent ZKsync capitalization throughout (previously mixed zkSync / zksync)
  • CLAUDE.md added with build commands, module layout, and architecture patterns

valera-grinenko-ai and others added 7 commits March 4, 2026 14:26
debug decode <hex>: decode raw calldata/revert data offline against all
known interop function and error selectors. Useful for inspecting failed
tx calldata, bundle files, or eth_call revert reasons without an RPC.

chains validate [alias]: validate stored chain aliases by checking RPC
reachability, stored vs live chainId match, and zkSync-specific method
support (zks_getL2ToL1LogProof, zks_getL1BatchNumber). Catches
misconfigured chains before relay operations are attempted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
README.md:
- chains validate: usage, example output, and troubleshooting entry
  for misconfigured aliases causing relay failures
- debug decode: in debug checklist, troubleshooting (revert decoding),
  and output format examples (human + JSON)

examples/10_debug_basics/basics.md:
- Show debug decode on an extracted bundle hex file after Step 4

examples/10_debug_basics/retry.md:
- Show debug decode on a revert hex at the expected-to-fail step
- Add debug decode as second diagnosis option alongside bundle explain
  for the wrong-chain failure mode
- Add chains validate as a cross-check for alias misconfiguration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests (src/abi.rs #[cfg(test)]):
- 14 unit tests for decode_calldata_bytes covering edge cases (empty,
  short, unknown selector), function call round-trips (bundleStatus,
  callStatus, interopRoots, sendMessage), and error decoding
  (WrongDestinationChainId, WrongSourceChainId, BundleAlreadyProcessed,
  ExecutingNotAllowed, IndirectCallValueMismatch, InvalidInteropBundleVersion,
  selector map collision check)

Bug fix:
- SolCall::abi_decode expects full calldata including the 4-byte selector
  prefix (same as SolError::abi_decode). The decode function was incorrectly
  passing params_data (selector stripped). Tests caught this.

Style:
- Remove unnecessary `as u64` cast in chains validate (clippy)
- cargo fmt applied across all changed files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds three new unit tests covering the most critical relay code paths:
- decode_verify_bundle_call_round_trip: encode via encode_verify_bundle_call,
  decode via decode_calldata_bytes, assert bundle fields and proof fields round-trip
- decode_execute_bundle_call_round_trip: same for executeBundle selector
- decode_send_bundle_call_round_trip: encode via encode_send_bundle_call with a
  real InteropCallStarter, assert destinationChainId, to/data/attributes fields

Also adds two helper functions (minimal_proof, minimal_bundle) shared by the
three tests, and updates CLAUDE.md to reflect the now-17-test suite.

Additionally adds a prerequisite section to examples/13_auto_relayer/README.md
explaining why chains validate should be run before starting the auto-relayer
and how to fix chain_id_match failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements `cast-interop debug proof-verify`, which verifies a
MessageInclusionProof cryptographically without spending any gas.

Protocol implementation (src/abi.rs):
- `L2_TO_L1_MESSENGER`: the 0x8008 system contract constant used as the
  L2 log sender in every leaf hash, as specified in MessageHashing.sol
- `compute_leaf_hash(tx_num, sender, data) -> B256`: reconstructs the
  88-byte packed L2Log and hashes it exactly as MessageHashing.getLeafHashFromMessage
- `compute_merkle_root_steps / compute_merkle_root`: walks the binary Merkle
  tree leaf→root, implementing Merkle.calculateRoot from the protocol source
- `parse_proof_metadata`: detects new (metadata-header) vs legacy (plain-path)
  proof format by checking whether bytes [4..32] of the first element are zero
- `verify_proof_offline(proof, verbose) -> ProofVerifyResult`: combines the
  above into a single call; populates per-step trace when verbose=true

New command (src/commands/proof_verify.rs):
- `debug proof-verify <proof.json> [--bundle <hex>] [--dest-chain <alias>] [--verbose] [--json]`
- Loads proof from file or inline JSON; patches message.data from --bundle
  exactly as the relay flow does (BUNDLE_IDENTIFIER prefix + abi_encode)
- Optional --dest-chain check: calls interopRoots(chainId, batchNumber) on
  the destination chain and compares to the computed root, distinguishing
  "proof is mathematically wrong" from "root not propagated yet"
- Human and JSON output modes; JSON output includes all intermediate fields

Unit tests (src/abi.rs, 8 new tests, 25 total):
- leaf_hash_changes_with_each_input_field
- leaf_hash_packed_size_is_88_bytes
- merkle_root_zero_depth_equals_leaf
- merkle_root_left_child_hashes_leaf_then_sibling
- merkle_root_right_child_hashes_sibling_then_leaf
- verify_proof_offline_valid_proof_passes
- verify_proof_offline_tampered_root_fails
- verify_proof_offline_verbose_populates_steps

Documentation:
- README.md: step 2b in Manual Steps workflow; debug checklist entry
- examples/10_debug_basics/basics.md: step 5b between proof fetch and root wait
- CLAUDE.md: updated test count and abi.rs module description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous docs showed HOW to use the three new commands but didn't answer
the question every developer asks: "do I actually need this, or could I just
use X instead?" This made features easy to overlook. Updated sections:

README.md — Troubleshooting (full rewrite into symptom-driven guide):
- "Got a raw revert selector": explains why debug decode exists (18 interop
  selectors, no need to manually compute keccak and decode ABI params) and
  covers both error and calldata/bundle decoding
- "Don't know if proof failed or just needs more time": explains the three-way
  verdict of proof-verify (VALID / NOT YET READY / INVALID) and why bundle
  explain / eth_call can't distinguish these cases
- "Relay fails after chains were reconfigured": explains that a stale chainId
  is completely silent — reverts look like protocol bugs, not config bugs

README.md — chains validate (configuration section):
- Added "When to use this" block explaining the silent misconfiguration story
  and calling out auto-relay as especially sensitive

README.md — proof-verify (manual steps section):
- Added explicit "Why not just use bundle explain?" comparison table
- Added per-verdict action guide (VALID / NOT YET READY / INVALID)

examples/10_debug_basics/basics.md — Step 5b:
- Added numbered explanation of what proof-verify actually computes
- Clarified when to wait vs when to re-fetch the proof

examples/10_debug_basics/retry.md — Step 6:
- Added explanation of what the raw revert hex means and why manual decoding
  is tedious, making the value of debug decode concrete
- Cross-checking section: added the stale-chainId root cause explanation
  and the fix workflow

examples/13_auto_relayer/README.md — Prerequisites:
- Added "Why this matters for auto-relay specifically" paragraph explaining
  that a misconfigured chain will silently fail every bundle it touches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace all instances of 'zkSync' and 'zksync' (as a word) with 'ZKsync'
across .md, .rs, and .toml files. Domain names (zksync.io, zksync.dev)
and RPC method prefixes (zks_) are left unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant