Feat/research by minkyun12 · Pull Request #83 · guzus/dr-manhattan

minkyun12 · 2026-02-14T19:32:02Z

No description provided.

netlify · 2026-02-14T19:32:07Z

✅ Deploy Preview for dr-manhattan-oss canceled.

Name	Link
🔨 Latest commit	`10ecdf5`
🔍 Latest deploy log	https://app.netlify.com/projects/dr-manhattan-oss/deploys/6990cdb51a3a750008ba9fd4

claude · 2026-02-14T19:34:42Z

Code Review - PR #83: Feat/research

Summary

This PR adds bot detection research capabilities for Polymarket wallets using machine learning techniques (K-Means clustering and Isolation Forest). It introduces a Jupyter notebook and utility functions for analyzing trading patterns.

✅ Strengths

Well-structured code: The utils.py module is cleanly organized with clear separation of concerns (feature engineering, clustering, bot scoring).
Comprehensive feature engineering: The ~20 behavioral features in compute_wallet_features() cover multiple dimensions:
- Frequency patterns (trades per hour, intervals)
- Size patterns (mean, variance, round number detection)
- Timing signals (hour entropy, zero-second detection)
- Market behavior (concentration, buy ratio)
- Profile completeness
Good documentation: Functions have clear docstrings explaining their purpose and return values.
Defensive programming: The code handles edge cases like division by zero, missing data, and small datasets appropriately.
Proper dependency management: New research dependencies are correctly added as optional dependencies in pyproject.toml.
Data directory ignored: The .gitignore correctly excludes research/data/ to prevent committing large datasets.

⚠️ Issues & Recommendations

1. Notebook Contains Executed Outputs (CRITICAL)

The notebook has been executed (12 code cells with execution counts) and likely contains outputs. Per CLAUDE.md instruction #2: "DO NOT create a new document".

Recommendation: Clear all outputs before committing:

jupyter nbconvert --clear-output --inplace research/detection.ipynb

2. Missing Test Coverage

No tests exist for the new research utilities. While research code may not need production-level testing, basic validation would help:

Unit tests for compute_wallet_features() with edge cases
Tests for empty DataFrames, single-wallet groups
Validation that feature calculations are numerically stable

Recommendation: Add tests/test_research_utils.py with basic tests.

3. Potential NaN Handling Issues

Lines 151 and 330 use .values to convert DataFrames to numpy arrays, but null handling may not be complete:

# Line 151 in run_clustering()
x = features_df[cols].values
x_scaled = scaler.fit_transform(x)
x_scaled = np.nan_to_num(x_scaled, nan=0.0, posinf=0.0, neginf=0.0)  # Good!

# Line 330 in run_isolation_forest()
x = StandardScaler().fit_transform(features_df[cols].values)
x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0)  # Good!

Actually, both cases handle this correctly with np.nan_to_num(). ✅

4. Magic Numbers Without Constants

Several thresholds are hardcoded:

min_trades: int = 5 (line 112)
contamination: float = 0.1 (line 327)
score >= 3.0 for BOT label (line 256)
score >= 1.0 for MIXED label (line 258)
z > 0.5 threshold (line 225)

Recommendation: Define these as module-level constants with comments explaining their rationale:

MIN_TRADES_THRESHOLD = 5  # Minimum trades needed for statistical significance
ISOLATION_FOREST_CONTAMINATION = 0.1  # Expected proportion of anomalies
BOT_SCORE_THRESHOLD = 3.0  # Z-score threshold for bot classification

5. Performance Considerations

build_group_features() uses a Python loop over wallet groups (line 119). For large datasets, this could be slow.
Consider using pandas.DataFrame.groupby().apply() or vectorized operations if performance becomes an issue.

6. Notebook Dependencies

The notebook imports from utils import ... (line 39 in the notebook), which requires:

Running from the research/ directory, OR
Having research/ in Python path

Recommendation: Add a cell at the top to handle path setup:

import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd()))

7. No Documentation

Per CLAUDE.md instruction #6, when integrating new exchanges/features, documentation should be created. While this isn't a new exchange, research capabilities would benefit from:

wiki/research/bot-detection.md explaining the methodology
Usage instructions for running the notebook

Recommendation: Add a simple README or wiki entry explaining:

What bot detection features are available
How to install research dependencies (uv pip install -e ".[research]")
How to run the notebook

8. Code Style Compliance

Unable to verify code style compliance without ruff and black installed in the CI environment. The code visually appears to follow conventions, but should be validated.

Recommendation: Ensure CI runs:

ruff check research/
black --check research/

🔒 Security Assessment

✅ No security concerns identified:

No API keys, tokens, or secrets in code
No use of eval() or exec()
No SQL injection vectors (uses pandas, not raw SQL)
No file system traversal vulnerabilities
Proper use of standard ML libraries

📊 Performance Analysis

Memory efficiency:

✅ Data is processed in memory using pandas (reasonable for most use cases)
⚠️ The notebook might struggle with very large datasets (millions of trades)
Consider adding batch processing if needed

Computational complexity:

compute_wallet_features(): O(n log n) due to sorting
run_clustering(): O(k * n * i) where k is clusters, n is wallets, i is iterations
run_isolation_forest(): O(n * trees * log(samples))

All acceptable for research workloads.

📝 Code Quality Score: 7.5/10

Breakdown:

Code structure: 9/10 (well-organized, clear separation)
Documentation: 7/10 (good docstrings, but missing usage docs)
Error handling: 8/10 (handles edge cases well)
Testing: 3/10 (no tests)
Style compliance: 8/10 (appears clean, but not verified)
Security: 10/10 (no issues)

✅ Required Changes Before Merge

MUST: Clear notebook outputs
SHOULD: Add basic unit tests
SHOULD: Extract magic numbers to constants
NICE-TO-HAVE: Add documentation in wiki/

🎯 Recommendation

Conditional approval - the code is well-written and secure, but the notebook should have outputs cleared before merging. Consider adding tests and documentation as follow-up improvements.

Great work on implementing a comprehensive bot detection system! The feature engineering is thoughtful and the code is production-quality. 🚀

minkyun12 added 3 commits February 15, 2026 04:21

bot detection

b47f2c1

bug fix

22f8b22

update visualization

10ecdf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feat/research#83

Feat/research#83
minkyun12 wants to merge 3 commits intomainfrom
feat/research

minkyun12 commented Feb 14, 2026

Uh oh!

netlify bot commented Feb 14, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

minkyun12 commented Feb 14, 2026

Uh oh!

netlify bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for dr-manhattan-oss canceled.

Uh oh!

claude bot commented Feb 14, 2026

Code Review - PR #83: Feat/research

Summary

✅ Strengths

⚠️ Issues & Recommendations

1. Notebook Contains Executed Outputs (CRITICAL)

2. Missing Test Coverage

3. Potential NaN Handling Issues

4. Magic Numbers Without Constants

5. Performance Considerations

6. Notebook Dependencies

7. No Documentation

8. Code Style Compliance

🔒 Security Assessment

📊 Performance Analysis

📝 Code Quality Score: 7.5/10

✅ Required Changes Before Merge

🎯 Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify bot commented Feb 14, 2026 •

edited

Loading