feat: SDK reflection for faster schema extraction#1410
feat: SDK reflection for faster schema extraction#1410MuncleUscles wants to merge 6 commits intomainfrom
Conversation
Use genlayer-py-std SDK reflection as primary method for schema extraction, with GenVM fallback. Performance improvement: ~50-100ms vs ~200-300ms. - Add backend/services/sdk_schema.py with SDK-based extraction - Add in-memory cache with 5-minute TTL - Update schema endpoints to try SDK first, fallback to GenVM - Add unit tests for SDK schema extraction
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces SDK-first contract schema extraction as an alternative to GenVM-based extraction. Two RPC endpoints are refactored to attempt SDK extraction first with automatic fallback to GenVM on failure. A new service module implements the SDK extraction logic, and tests validate the extraction path and fallback behavior. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Endpoint as RPC Endpoint
participant SDK as SDK Extractor
participant GenVM as GenVM Extractor
Client->>Endpoint: get_contract_schema()
Endpoint->>Endpoint: Decode contract_code
Endpoint->>SDK: extract_schema_via_sdk(contract_code)
alt SDK extraction succeeds
SDK-->>Endpoint: schema dict
Endpoint-->>Client: Return schema
else SDK returns None or fails
Endpoint->>GenVM: Extract schema via GenVM
GenVM-->>Endpoint: schema dict
Endpoint-->>Client: Return schema
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@backend/services/sdk_schema.py`:
- Around line 146-220: The extraction modifies global process-wide state
(sys.path, sys.modules, os.environ) and must be serialized: add a module-level
threading.Lock named _extraction_lock and acquire it around the whole extraction
sequence that begins with calling _setup_wasi_mocks() and ends after restoring
sys.path and removing "__sdk_contract__" from sys.modules (i.e., wrap the
try/.../finally that contains get_schema, temp file handling, and cleanup) so
concurrent calls can't interleave; ensure the lock is always released in a
finally block and does not change existing error handling or the calls to
get_schema, _cache_schema, or other referenced symbols.
- Around line 146-149: The GENERATING_DOCS env var is set to "true" in the try
block but never restored; preserve the previous value before setting it, and
restore or delete it in a finally block after the extraction completes.
Specifically, around the code that calls _setup_wasi_mocks() and sets
os.environ["GENERATING_DOCS"] = "true", capture prev =
os.environ.get("GENERATING_DOCS"), set the var for the extraction work, and in a
finally ensure you either del os.environ["GENERATING_DOCS"] if prev is None or
reset os.environ["GENERATING_DOCS"] = prev to avoid leaking the flag to
subsequent operations.
In `@tests/unit/test_sdk_schema.py`:
- Around line 55-76: The test test_extract_schema_caches_results is incomplete
(it sets up mock_schema, patches _get_sdk_paths and wraps extract_schema_via_sdk
into mock_extract) but does nothing; replace the trailing pass by invoking the
module function that triggers SDK extraction twice (the public/cache-aware
entrypoint in backend.services.sdk_schema that calls extract_schema_via_sdk),
assert the first call returns mock_schema and causes mock_extract to be called
once, then call it a second time and assert mock_extract was not called again
and the returned schema equals mock_schema; keep the existing environment and
_get_sdk_paths patch, and use mock_extract.assert_called_once() and equality
assertions to validate caching.
🧹 Nitpick comments (3)
backend/services/sdk_schema.py (2)
70-81: WASI mock persists for process lifetime.The mock is installed once and never removed, which is fine for the intended use case. However, be aware that this affects the entire process and cannot be undone. If real WASI bindings become available later in the process, they won't be used.
Consider adding a comment documenting this intentional behavior.
84-88: Consider movinghashlibimport to module level.The inline import of
hashlibinside_get_cache_keyworks but is unconventional. Moving it to the top-level imports improves readability and avoids repeated import overhead (though Python caches imports).Suggested change
Add to imports at line 17:
import hashlibThen simplify the function:
def _get_cache_key(code: bytes) -> str: """Generate cache key from contract code hash.""" - import hashlib - return hashlib.sha256(code).hexdigest()backend/protocol_rpc/endpoints.py (1)
706-748: Consider extracting duplicated SDK extraction logic.The SDK extraction try/except block (lines 716-725) is nearly identical to the one in
get_contract_schema(lines 668-678). This duplication could be extracted to a helper function.♻️ Suggested helper function
def _try_sdk_schema_extraction(contract_code: bytes) -> dict | None: """Attempt SDK-based schema extraction, returning None on any failure.""" try: from backend.services.sdk_schema import extract_schema_via_sdk sdk_schema = extract_schema_via_sdk(contract_code) if sdk_schema is not None: return sdk_schema logger.debug("SDK schema extraction returned None, falling back to GenVM") except Exception as e: logger.debug(f"SDK schema extraction failed: {e}, falling back to GenVM") return NoneThen both endpoints can use:
sdk_schema = _try_sdk_schema_extraction(contract_code) if sdk_schema is not None: return sdk_schema # Continue with GenVM fallback...
| try: | ||
| # Setup environment | ||
| _setup_wasi_mocks() | ||
| os.environ["GENERATING_DOCS"] = "true" | ||
|
|
||
| # Add SDK paths to sys.path temporarily | ||
| original_path = sys.path.copy() | ||
| sys.path.insert(0, str(sdk_src)) | ||
| if sdk_emb.exists(): | ||
| sys.path.insert(0, str(sdk_emb)) | ||
|
|
||
| try: | ||
| # Import the schema extraction function | ||
| from genlayer.py.get_schema import get_schema | ||
|
|
||
| # Write contract to temp file and load it as a module | ||
| with tempfile.NamedTemporaryFile( | ||
| mode="wb", suffix=".py", delete=False | ||
| ) as f: | ||
| f.write(contract_code) | ||
| temp_path = f.name | ||
|
|
||
| try: | ||
| # Load contract module | ||
| spec = importlib.util.spec_from_file_location( | ||
| "__sdk_contract__", temp_path | ||
| ) | ||
| if spec is None or spec.loader is None: | ||
| logger.debug("Failed to create module spec for contract") | ||
| return None | ||
|
|
||
| module = importlib.util.module_from_spec(spec) | ||
| sys.modules["__sdk_contract__"] = module | ||
| spec.loader.exec_module(module) | ||
|
|
||
| # Find the Contract class | ||
| contract_class = None | ||
| for name, obj in vars(module).items(): | ||
| if isinstance(obj, type) and name != "Contract": | ||
| # Check if it inherits from Contract | ||
| bases = [b.__name__ for b in obj.__mro__] | ||
| if "Contract" in bases: | ||
| contract_class = obj | ||
| break | ||
|
|
||
| if contract_class is None: | ||
| logger.debug("No Contract class found in module") | ||
| return None | ||
|
|
||
| # Extract schema using SDK reflection | ||
| schema = get_schema(contract_class) | ||
|
|
||
| elapsed_ms = int((time.time() - start_time) * 1000) | ||
| logger.info( | ||
| f"SDK schema extraction succeeded in {elapsed_ms}ms " | ||
| f"for {contract_class.__name__}" | ||
| ) | ||
|
|
||
| # Cache the result | ||
| _cache_schema(contract_code, schema) | ||
|
|
||
| return schema | ||
|
|
||
| finally: | ||
| # Cleanup temp file | ||
| try: | ||
| os.unlink(temp_path) | ||
| except Exception: | ||
| pass | ||
| # Remove from sys.modules | ||
| sys.modules.pop("__sdk_contract__", None) | ||
|
|
||
| finally: | ||
| # Restore sys.path | ||
| sys.path = original_path |
There was a problem hiding this comment.
Thread safety concern with global state modifications.
This function modifies global state (sys.path, sys.modules, os.environ) which could cause race conditions if called concurrently from multiple threads. While there's a cache lock for the cache itself, the extraction logic is not protected.
Consider adding a lock around the entire extraction process or documenting that concurrent extractions of different contracts may interfere with each other.
🔒️ Proposed fix - add extraction lock
Add a module-level lock:
_extraction_lock = threading.Lock()Then wrap the extraction in the lock:
sdk_src, sdk_emb = sdk_paths
start_time = time.time()
- try:
+ with _extraction_lock:
+ try:
# Setup environment
_setup_wasi_mocks()
# ... rest of the try block ...
+ except Exception as e:
+ # ... exception handling ...🧰 Tools
🪛 Ruff (0.14.14)
213-214: try-except-pass detected, consider logging the exception
(S110)
213-213: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@backend/services/sdk_schema.py` around lines 146 - 220, The extraction
modifies global process-wide state (sys.path, sys.modules, os.environ) and must
be serialized: add a module-level threading.Lock named _extraction_lock and
acquire it around the whole extraction sequence that begins with calling
_setup_wasi_mocks() and ends after restoring sys.path and removing
"__sdk_contract__" from sys.modules (i.e., wrap the try/.../finally that
contains get_schema, temp file handling, and cleanup) so concurrent calls can't
interleave; ensure the lock is always released in a finally block and does not
change existing error handling or the calls to get_schema, _cache_schema, or
other referenced symbols.
| try: | ||
| # Setup environment | ||
| _setup_wasi_mocks() | ||
| os.environ["GENERATING_DOCS"] = "true" |
There was a problem hiding this comment.
GENERATING_DOCS environment variable is never restored.
The environment variable is set but not cleaned up after extraction. This could affect subsequent operations in the same process that depend on this variable being unset.
🐛 Proposed fix
try:
# Setup environment
_setup_wasi_mocks()
+ original_generating_docs = os.environ.get("GENERATING_DOCS")
os.environ["GENERATING_DOCS"] = "true"
# Add SDK paths to sys.path temporarily
original_path = sys.path.copy()
sys.path.insert(0, str(sdk_src))
if sdk_emb.exists():
sys.path.insert(0, str(sdk_emb))
try:
# ... existing code ...
finally:
# Restore sys.path
sys.path = original_path
+ # Restore GENERATING_DOCS
+ if original_generating_docs is None:
+ os.environ.pop("GENERATING_DOCS", None)
+ else:
+ os.environ["GENERATING_DOCS"] = original_generating_docs🤖 Prompt for AI Agents
In `@backend/services/sdk_schema.py` around lines 146 - 149, The GENERATING_DOCS
env var is set to "true" in the try block but never restored; preserve the
previous value before setting it, and restore or delete it in a finally block
after the extraction completes. Specifically, around the code that calls
_setup_wasi_mocks() and sets os.environ["GENERATING_DOCS"] = "true", capture
prev = os.environ.get("GENERATING_DOCS"), set the var for the extraction work,
and in a finally ensure you either del os.environ["GENERATING_DOCS"] if prev is
None or reset os.environ["GENERATING_DOCS"] = prev to avoid leaking the flag to
subsequent operations.
Contracts change frequently during development, caching would return stale schemas. Removed: cache dictionary, TTL/size limits, cache-related tests. Kept: SDK reflection extraction logic.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/services/sdk_schema.py`:
- Around line 76-175: The module no longer exports clear_cache which breaks
callers importing it; add a compatibility shim by defining a module-level
function named clear_cache() that performs a no-op (or delegates to any existing
cache-clear helper like _get_sdk_paths/_setup_wasi_mocks if appropriate) and
ensure it is exported from the module so imports succeed without changing
callers; place the clear_cache definition near extract_schema_via_sdk and any
other helpers so it’s discoverable and update __all__ if present.
🧹 Nitpick comments (2)
tests/unit/test_sdk_schema.py (2)
12-66: Add type hints to test methods and fixture.Guidelines require type hints in all Python code; test methods/fixture are missing them.
Based on coding guidelines, include type hints throughout.♻️ Example update
-import pytest -import os +import os +from pathlib import Path +import pytest @@ -class TestSdkSchemaExtraction: +class TestSdkSchemaExtraction: @@ - def test_get_sdk_paths_without_genvmroot(self): + def test_get_sdk_paths_without_genvmroot(self) -> None: @@ - def test_get_sdk_paths_with_invalid_genvmroot(self, tmp_path): + def test_get_sdk_paths_with_invalid_genvmroot(self, tmp_path: Path) -> None: @@ - def test_extract_schema_returns_none_without_sdk(self): + def test_extract_schema_returns_none_without_sdk(self) -> None: @@ -class TestSdkSchemaIntegration: +class TestSdkSchemaIntegration: @@ - def has_sdk(self): + def has_sdk(self) -> tuple[Path, Path]: @@ - def test_extract_simple_contract(self, has_sdk): + def test_extract_simple_contract(self, has_sdk: tuple[Path, Path]) -> None:
48-66: Silence unused fixture-argument warning.
has_sdkis only used to trigger the fixture; ruff flags it as unused. Consider@pytest.mark.usefixtures("has_sdk")and remove the parameter, or rename it to_has_sdk.
| def extract_schema_via_sdk(contract_code: bytes) -> dict | None: | ||
| """ | ||
| Extract contract schema using SDK reflection. | ||
|
|
||
| Args: | ||
| contract_code: Contract source code as bytes (UTF-8 encoded Python) | ||
|
|
||
| Returns: | ||
| Schema dict if successful, None if SDK extraction failed. | ||
| Caller should fall back to GenVM-based extraction on None. | ||
|
|
||
| Performance: ~50-100ms vs ~200-300ms with GenVM | ||
| """ | ||
| sdk_paths = _get_sdk_paths() | ||
| if sdk_paths is None: | ||
| return None | ||
|
|
||
| sdk_src, sdk_emb = sdk_paths | ||
| start_time = time.time() | ||
|
|
||
| try: | ||
| # Setup environment | ||
| _setup_wasi_mocks() | ||
| os.environ["GENERATING_DOCS"] = "true" | ||
|
|
||
| # Add SDK paths to sys.path temporarily | ||
| original_path = sys.path.copy() | ||
| sys.path.insert(0, str(sdk_src)) | ||
| if sdk_emb.exists(): | ||
| sys.path.insert(0, str(sdk_emb)) | ||
|
|
||
| try: | ||
| # Import the schema extraction function | ||
| from genlayer.py.get_schema import get_schema | ||
|
|
||
| # Write contract to temp file and load it as a module | ||
| with tempfile.NamedTemporaryFile( | ||
| mode="wb", suffix=".py", delete=False | ||
| ) as f: | ||
| f.write(contract_code) | ||
| temp_path = f.name | ||
|
|
||
| try: | ||
| # Load contract module | ||
| spec = importlib.util.spec_from_file_location( | ||
| "__sdk_contract__", temp_path | ||
| ) | ||
| if spec is None or spec.loader is None: | ||
| logger.debug("Failed to create module spec for contract") | ||
| return None | ||
|
|
||
| module = importlib.util.module_from_spec(spec) | ||
| sys.modules["__sdk_contract__"] = module | ||
| spec.loader.exec_module(module) | ||
|
|
||
| # Find the Contract class | ||
| contract_class = None | ||
| for name, obj in vars(module).items(): | ||
| if isinstance(obj, type) and name != "Contract": | ||
| # Check if it inherits from Contract | ||
| bases = [b.__name__ for b in obj.__mro__] | ||
| if "Contract" in bases: | ||
| contract_class = obj | ||
| break | ||
|
|
||
| if contract_class is None: | ||
| logger.debug("No Contract class found in module") | ||
| return None | ||
|
|
||
| # Extract schema using SDK reflection | ||
| schema = get_schema(contract_class) | ||
|
|
||
| elapsed_ms = int((time.time() - start_time) * 1000) | ||
| logger.info( | ||
| f"SDK schema extraction succeeded in {elapsed_ms}ms " | ||
| f"for {contract_class.__name__}" | ||
| ) | ||
|
|
||
| return schema | ||
|
|
||
| finally: | ||
| # Cleanup temp file | ||
| try: | ||
| os.unlink(temp_path) | ||
| except Exception: | ||
| pass | ||
| # Remove from sys.modules | ||
| sys.modules.pop("__sdk_contract__", None) | ||
|
|
||
| finally: | ||
| # Restore sys.path | ||
| sys.path = original_path | ||
|
|
||
| except Exception as e: | ||
| elapsed_ms = int((time.time() - start_time) * 1000) | ||
| logger.debug( | ||
| f"SDK schema extraction failed after {elapsed_ms}ms: {e}", | ||
| exc_info=True, | ||
| ) | ||
| return None |
There was a problem hiding this comment.
Missing clear_cache export breaks imports (CI failure).
The pipeline error indicates clear_cache is imported from this module but no longer exists. Either remove those imports or provide a compatibility no‑op to keep the API stable.
🩹 Suggested compatibility shim
+def clear_cache() -> None:
+ """No-op: cache removed, retained for backward compatibility."""
+ return None🧰 Tools
🪛 Ruff (0.14.14)
160-161: try-except-pass detected, consider logging the exception
(S110)
160-160: Do not catch blind exception: Exception
(BLE001)
169-169: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@backend/services/sdk_schema.py` around lines 76 - 175, The module no longer
exports clear_cache which breaks callers importing it; add a compatibility shim
by defining a module-level function named clear_cache() that performs a no-op
(or delegates to any existing cache-clear helper like
_get_sdk_paths/_setup_wasi_mocks if appropriate) and ensure it is exported from
the module so imports succeed without changing callers; place the clear_cache
definition near extract_schema_via_sdk and any other helpers so it’s
discoverable and update __all__ if present.
- Remove clear_cache from __init__.py as it doesn't exist in sdk_schema.py - Change ''' to """ in test file to pass black formatting
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/protocol_rpc/endpoints.py (1)
827-859: Extract the SDK→GenVM fallback into a shared helper to avoid drift.The same extraction/fallback block is duplicated in two endpoints. Centralizing it will keep behavior and logging consistent.
♻️ Refactor sketch
+async def _extract_schema_with_fallback( + contract_code: bytes, + genvm_manager: GenVMManager, + msg_handler: IMessageHandler, +) -> dict: + try: + from backend.services.sdk_schema import extract_schema_via_sdk + + sdk_schema = extract_schema_via_sdk(contract_code) + if sdk_schema is not None: + return sdk_schema + logger.debug("SDK schema extraction returned None, falling back to GenVM") + except Exception as e: + logger.debug(f"SDK schema extraction failed: {e}, falling back to GenVM") + + node = Node( + contract_snapshot=None, + validator_mode=ExecutionMode.LEADER, + validator=Validator( + address="", + stake=0, + llmprovider=LLMProvider( + provider="", + model="", + config={}, + plugin="", + plugin_config={}, + ), + ), + leader_receipt=None, + msg_handler=msg_handler.with_client_session(get_client_session_id()), + contract_snapshot_factory=None, + manager=genvm_manager, + ) + schema = await node.get_contract_schema(contract_code) + return json.loads(schema)Also applies to: 875-907
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/protocol_rpc/endpoints.py` around lines 827 - 859, Create a shared helper (e.g., extract_schema_with_sdk_fallback) that encapsulates the SDK first then GenVM fallback logic: call extract_schema_via_sdk(contract_code) and if it returns a schema return it (with the same debug logs and exception handling), otherwise instantiate the same Node using the passed genvm_manager and msg_handler.with_client_session(get_client_session_id()), call node.get_contract_schema(contract_code), parse the JSON and return the schema; then replace the duplicated blocks in the endpoints with calls to this helper (passing contract_code, logger, msg_handler, genvm_manager) so logging, exception handling and behavior are centralized and identical across both locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/protocol_rpc/endpoints.py`:
- Line 825: The NameError comes from using contract_account which isn't defined
in this scope; locate the line that sets contract_code =
base64.b64decode(contract_account["data"]["code"]) and replace contract_account
with the actual variable that contains the account dict in this function (e.g.,
account, account_info or contract_state as used elsewhere in this function),
then add a defensive check that ["data"]["code"] exists before decoding and wrap
the decode in a try/except to log/raise a clear error if decoding fails;
reference the exact occurrence of contract_code = base64.b64decode(...) to make
this change.
---
Nitpick comments:
In `@backend/protocol_rpc/endpoints.py`:
- Around line 827-859: Create a shared helper (e.g.,
extract_schema_with_sdk_fallback) that encapsulates the SDK first then GenVM
fallback logic: call extract_schema_via_sdk(contract_code) and if it returns a
schema return it (with the same debug logs and exception handling), otherwise
instantiate the same Node using the passed genvm_manager and
msg_handler.with_client_session(get_client_session_id()), call
node.get_contract_schema(contract_code), parse the JSON and return the schema;
then replace the duplicated blocks in the endpoints with calls to this helper
(passing contract_code, logger, msg_handler, genvm_manager) so logging,
exception handling and behavior are centralized and identical across both
locations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e2b7e772-3210-4d3e-9491-5b7aeaabd6a0
📒 Files selected for processing (1)
backend/protocol_rpc/endpoints.py
| "Contract not deployed.", | ||
| ) | ||
|
|
||
| contract_code = base64.b64decode(contract_account["data"]["code"]) |
There was a problem hiding this comment.
Fix NameError in schema decode path (blocker).
Line 825 references contract_account, which is undefined in this scope. This raises immediately and prevents both SDK extraction and GenVM fallback.
🐛 Proposed fix
- contract_code = base64.b64decode(contract_account["data"]["code"])
+ contract_code = base64.b64decode(code_b64)🧰 Tools
🪛 Ruff (0.15.6)
[error] 825-825: Undefined name contract_account
(F821)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/protocol_rpc/endpoints.py` at line 825, The NameError comes from
using contract_account which isn't defined in this scope; locate the line that
sets contract_code = base64.b64decode(contract_account["data"]["code"]) and
replace contract_account with the actual variable that contains the account dict
in this function (e.g., account, account_info or contract_state as used
elsewhere in this function), then add a defensive check that ["data"]["code"]
exists before decoding and wrap the decode in a try/except to log/raise a clear
error if decoding fails; reference the exact occurrence of contract_code =
base64.b64decode(...) to make this change.


Summary
Add SDK-based contract schema extraction as the primary method, with GenVM fallback for reliability.
Performance improvement: ~50-100ms vs ~200-300ms with GenVM (~3-4x faster)
Changes
backend/services/sdk_schema.py- SDK-based schema extraction using Python reflectionHow it works
_genlayer_wasi(WASI bindings for storage/balance operations)GENERATING_DOCS=trueto enable doc-generation mode in SDKgenlayer.py.get_schema()for reflection-based schema extractionSDK paths
The SDK is located at
$GENVMROOT/runners/genlayer-py-std/srcin the Docker environment. Falls back to GenVM-based extraction if SDK paths are not available.Test plan
Summary by CodeRabbit
New Features
Tests