docs: Update version to 0.2.7rc1 and add LightGBM documentation

xRiskLab · xRiskLab · commit f5e8b93dc4b9 · 2025-11-23T12:58:34.000+01:00
- Update version to 0.2.7rc1 in __init__.py
- Add comprehensive CHANGELOG entry for 0.2.7rc1
- Add LightGBM section to README with full example
- Mark as Release Candidate (testing phase)
- Fix typo: famework -&gt; framework
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,38 @@
 # Changelog
 
+## [0.2.7rc1] - 2025-11-23 (Release Candidate)
+
+### Added
+- **LightGBM Scorecard Support**: Complete implementation of `LGBScorecardConstructor`
+  - Implemented `create_points()` with proper base_score normalization
+  - Implemented `predict_score()` and `predict_scores()` for scorecard predictions
+  - Added `use_base_score` parameter for flexible base score handling
+  - Full parity with XGBoost scorecard functionality
+
+### Fixed
+- **Critical Bug Fix**: Corrected leaf ID mapping in `extract_leaf_weights()`
+  - Changed from `cumcount()` to extracting actual leaf ID from node_index string
+  - Fixes 55% Gini loss (0.40 → 0.90) in scorecard predictions
+  - Ensures correct mapping between LightGBM's absolute leaf IDs and relative indices
+- **Base Score Normalization**: Proper handling of LightGBM's base score
+  - Subtract base_score from Tree 0 leaves to balance tree contributions
+  - Add logit(base_score) during scaling to distribute across all trees
+  - Prevents first tree from getting disproportional weight
+
+### Changed
+- **Simplified Score Types**: Only `XAddEvidence` supported for LightGBM
+  - Removed WOE support (ill-defined for LightGBM's sklearn API)
+  - Cleaner, more maintainable implementation
+- **Enhanced Documentation**: Updated docstrings and examples
+  - Added comprehensive LightGBM getting-started notebook
+  - Explained base_score handling differences from XGBoost
+
+### Technical Details
+- All 106 tests passing (9 LightGBM-specific tests)
+- Scorecard Gini: 0.9020 vs Model Gini: 0.9021 (perfect preservation)
+- Proper handling of LightGBM's sklearn API vs internal booster API
+- Related to PR #8
+
 ## [0.2.7a2] - 2025-11-08 (Alpha)
 
 ### Added
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # xbooster 🚀
 
-A scorecard-format famework for logistic regression tasks with gradient-boosted decision trees (XGBoost and CatBoost).
+A scorecard-format framework for logistic regression tasks with gradient-boosted decision trees (XGBoost, LightGBM, and CatBoost).
 xbooster allows to convert a classification model into a logarithmic (point) scoring system.
 
 In addition, it provides a suite of interpretability tools to understand the model's behavior.
@@ -218,6 +218,89 @@ The `DataPreprocessor` provides:
 3. Generation of interaction constraints for XGBoost
 4. Consistent feature naming for scorecard generation
 
+### LightGBM Support 💡 (Release Candidate)
+
+xbooster provides support for LightGBM models with scorecard functionality. Here's how to use it:
+
+```python
+import pandas as pd
+import lightgbm as lgb
+from xbooster.constructor import LGBScorecardConstructor
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import roc_auc_score
+
+# Load data
+url = "https://github.com/xRiskLab/xBooster/raw/main/examples/data/credit_data.parquet"
+dataset = pd.read_parquet(url)
+
+features = [
+    "external_risk_estimate",
+    "revolving_utilization_of_unsecured_lines",
+    "account_never_delinq_percent",
+    "net_fraction_revolving_burden",
+    "num_total_cc_accounts",
+    "average_months_in_file",
+]
+
+target = "is_bad"
+X, y = dataset[features], dataset[target]
+
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.3, random_state=62, stratify=y
+)
+
+# Train LightGBM model
+model = lgb.LGBMClassifier(
+    n_estimators=50,
+    learning_rate=0.55,
+    max_depth=1,
+    num_leaves=2,
+    min_child_samples=10,
+    random_state=62,
+    verbose=-1,
+)
+model.fit(X_train, y_train)
+
+# Initialize LGBScorecardConstructor
+constructor = LGBScorecardConstructor(model, X_train, y_train)
+
+# Construct scorecard
+scorecard = constructor.construct_scorecard()
+print(scorecard.head())
+
+# Create points with base score normalization (default)
+scorecard_with_points = constructor.create_points(
+    pdo=50,
+    target_points=600,
+    target_odds=19,
+    precision_points=0,
+    use_base_score=True  # Ensures proper tree contribution balancing
+)
+
+# Make predictions
+credit_scores = constructor.predict_score(X_test)
+
+# Calculate Gini
+gini = roc_auc_score(y_test, -credit_scores) * 2 - 1
+print(f"Scorecard Gini: {gini:.4f}")
+
+# Compare with model predictions
+model_gini = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) * 2 - 1
+print(f"Model Gini: {model_gini:.4f}")
+```
+
+**Key Features:**
+- **Scorecard Construction**: Implementation of `create_points()` and `predict_score()`
+- **Base Score Normalization**: Proper handling of LightGBM's base score for balanced tree contributions
+- **High Discrimination**: Scorecard Gini closely matches model Gini
+- **Flexible**: `use_base_score` parameter for optional base score normalization
+
+**Important Notes:**
+- **Release Candidate**: This feature is in testing phase - feedback welcome!
+- LightGBM's sklearn API handles base_score differently than XGBoost
+- The `use_base_score=True` parameter (default) ensures proper normalization
+- Only `XAddEvidence` score type is supported (WOE not applicable)
+
 ### CatBoost Support 🐱 (Beta)
 
 xbooster provides experimental support for CatBoost models with reduced functionality compared to XGBoost. Here's how to use it:
diff --git a/xbooster/__init__.py b/xbooster/__init__.py
@@ -5,7 +5,7 @@
 from gradient boosted tree models (XGBoost and CatBoost).
 """
 
-__version__ = "0.2.7"
+__version__ = "0.2.7rc1"
 __author__ = "xRiskLab"
 __email__ = "contact@xrisklab.ai"