Skip to content

Comments

Feat/research#83

Open
minkyun12 wants to merge 3 commits intomainfrom
feat/research
Open

Feat/research#83
minkyun12 wants to merge 3 commits intomainfrom
feat/research

Conversation

@minkyun12
Copy link
Collaborator

No description provided.

@netlify
Copy link

netlify bot commented Feb 14, 2026

Deploy Preview for dr-manhattan-oss canceled.

Name Link
🔨 Latest commit 10ecdf5
🔍 Latest deploy log https://app.netlify.com/projects/dr-manhattan-oss/deploys/6990cdb51a3a750008ba9fd4

@claude
Copy link

claude bot commented Feb 14, 2026

Code Review - PR #83: Feat/research

Summary

This PR adds bot detection research capabilities for Polymarket wallets using machine learning techniques (K-Means clustering and Isolation Forest). It introduces a Jupyter notebook and utility functions for analyzing trading patterns.


✅ Strengths

  1. Well-structured code: The utils.py module is cleanly organized with clear separation of concerns (feature engineering, clustering, bot scoring).

  2. Comprehensive feature engineering: The ~20 behavioral features in compute_wallet_features() cover multiple dimensions:

    • Frequency patterns (trades per hour, intervals)
    • Size patterns (mean, variance, round number detection)
    • Timing signals (hour entropy, zero-second detection)
    • Market behavior (concentration, buy ratio)
    • Profile completeness
  3. Good documentation: Functions have clear docstrings explaining their purpose and return values.

  4. Defensive programming: The code handles edge cases like division by zero, missing data, and small datasets appropriately.

  5. Proper dependency management: New research dependencies are correctly added as optional dependencies in pyproject.toml.

  6. Data directory ignored: The .gitignore correctly excludes research/data/ to prevent committing large datasets.


⚠️ Issues & Recommendations

1. Notebook Contains Executed Outputs (CRITICAL)

The notebook has been executed (12 code cells with execution counts) and likely contains outputs. Per CLAUDE.md instruction #2: "DO NOT create a new document".

Recommendation: Clear all outputs before committing:

jupyter nbconvert --clear-output --inplace research/detection.ipynb

2. Missing Test Coverage

No tests exist for the new research utilities. While research code may not need production-level testing, basic validation would help:

  • Unit tests for compute_wallet_features() with edge cases
  • Tests for empty DataFrames, single-wallet groups
  • Validation that feature calculations are numerically stable

Recommendation: Add tests/test_research_utils.py with basic tests.

3. Potential NaN Handling Issues

Lines 151 and 330 use .values to convert DataFrames to numpy arrays, but null handling may not be complete:

# Line 151 in run_clustering()
x = features_df[cols].values
x_scaled = scaler.fit_transform(x)
x_scaled = np.nan_to_num(x_scaled, nan=0.0, posinf=0.0, neginf=0.0)  # Good!

# Line 330 in run_isolation_forest()
x = StandardScaler().fit_transform(features_df[cols].values)
x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0)  # Good!

Actually, both cases handle this correctly with np.nan_to_num(). ✅

4. Magic Numbers Without Constants

Several thresholds are hardcoded:

  • min_trades: int = 5 (line 112)
  • contamination: float = 0.1 (line 327)
  • score >= 3.0 for BOT label (line 256)
  • score >= 1.0 for MIXED label (line 258)
  • z > 0.5 threshold (line 225)

Recommendation: Define these as module-level constants with comments explaining their rationale:

MIN_TRADES_THRESHOLD = 5  # Minimum trades needed for statistical significance
ISOLATION_FOREST_CONTAMINATION = 0.1  # Expected proportion of anomalies
BOT_SCORE_THRESHOLD = 3.0  # Z-score threshold for bot classification

5. Performance Considerations

  • build_group_features() uses a Python loop over wallet groups (line 119). For large datasets, this could be slow.
  • Consider using pandas.DataFrame.groupby().apply() or vectorized operations if performance becomes an issue.

6. Notebook Dependencies

The notebook imports from utils import ... (line 39 in the notebook), which requires:

  • Running from the research/ directory, OR
  • Having research/ in Python path

Recommendation: Add a cell at the top to handle path setup:

import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd()))

7. No Documentation

Per CLAUDE.md instruction #6, when integrating new exchanges/features, documentation should be created. While this isn't a new exchange, research capabilities would benefit from:

  • wiki/research/bot-detection.md explaining the methodology
  • Usage instructions for running the notebook

Recommendation: Add a simple README or wiki entry explaining:

  • What bot detection features are available
  • How to install research dependencies (uv pip install -e ".[research]")
  • How to run the notebook

8. Code Style Compliance

Unable to verify code style compliance without ruff and black installed in the CI environment. The code visually appears to follow conventions, but should be validated.

Recommendation: Ensure CI runs:

ruff check research/
black --check research/

🔒 Security Assessment

No security concerns identified:

  • No API keys, tokens, or secrets in code
  • No use of eval() or exec()
  • No SQL injection vectors (uses pandas, not raw SQL)
  • No file system traversal vulnerabilities
  • Proper use of standard ML libraries

📊 Performance Analysis

Memory efficiency:

  • ✅ Data is processed in memory using pandas (reasonable for most use cases)
  • ⚠️ The notebook might struggle with very large datasets (millions of trades)
  • Consider adding batch processing if needed

Computational complexity:

  • compute_wallet_features(): O(n log n) due to sorting
  • run_clustering(): O(k * n * i) where k is clusters, n is wallets, i is iterations
  • run_isolation_forest(): O(n * trees * log(samples))

All acceptable for research workloads.


📝 Code Quality Score: 7.5/10

Breakdown:

  • Code structure: 9/10 (well-organized, clear separation)
  • Documentation: 7/10 (good docstrings, but missing usage docs)
  • Error handling: 8/10 (handles edge cases well)
  • Testing: 3/10 (no tests)
  • Style compliance: 8/10 (appears clean, but not verified)
  • Security: 10/10 (no issues)

✅ Required Changes Before Merge

  1. MUST: Clear notebook outputs
  2. SHOULD: Add basic unit tests
  3. SHOULD: Extract magic numbers to constants
  4. NICE-TO-HAVE: Add documentation in wiki/

🎯 Recommendation

Conditional approval - the code is well-written and secure, but the notebook should have outputs cleared before merging. Consider adding tests and documentation as follow-up improvements.

Great work on implementing a comprehensive bot detection system! The feature engineering is thoughtful and the code is production-quality. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant