Skip to content

Commit f5e8b93

Browse files
committed
docs: Update version to 0.2.7rc1 and add LightGBM documentation
- Update version to 0.2.7rc1 in __init__.py - Add comprehensive CHANGELOG entry for 0.2.7rc1 - Add LightGBM section to README with full example - Mark as Release Candidate (testing phase) - Fix typo: famework -> framework
1 parent 9bd287b commit f5e8b93

File tree

3 files changed

+118
-2
lines changed

3 files changed

+118
-2
lines changed

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,38 @@
11
# Changelog
22

3+
## [0.2.7rc1] - 2025-11-23 (Release Candidate)
4+
5+
### Added
6+
- **LightGBM Scorecard Support**: Complete implementation of `LGBScorecardConstructor`
7+
- Implemented `create_points()` with proper base_score normalization
8+
- Implemented `predict_score()` and `predict_scores()` for scorecard predictions
9+
- Added `use_base_score` parameter for flexible base score handling
10+
- Full parity with XGBoost scorecard functionality
11+
12+
### Fixed
13+
- **Critical Bug Fix**: Corrected leaf ID mapping in `extract_leaf_weights()`
14+
- Changed from `cumcount()` to extracting actual leaf ID from node_index string
15+
- Fixes 55% Gini loss (0.40 → 0.90) in scorecard predictions
16+
- Ensures correct mapping between LightGBM's absolute leaf IDs and relative indices
17+
- **Base Score Normalization**: Proper handling of LightGBM's base score
18+
- Subtract base_score from Tree 0 leaves to balance tree contributions
19+
- Add logit(base_score) during scaling to distribute across all trees
20+
- Prevents first tree from getting disproportional weight
21+
22+
### Changed
23+
- **Simplified Score Types**: Only `XAddEvidence` supported for LightGBM
24+
- Removed WOE support (ill-defined for LightGBM's sklearn API)
25+
- Cleaner, more maintainable implementation
26+
- **Enhanced Documentation**: Updated docstrings and examples
27+
- Added comprehensive LightGBM getting-started notebook
28+
- Explained base_score handling differences from XGBoost
29+
30+
### Technical Details
31+
- All 106 tests passing (9 LightGBM-specific tests)
32+
- Scorecard Gini: 0.9020 vs Model Gini: 0.9021 (perfect preservation)
33+
- Proper handling of LightGBM's sklearn API vs internal booster API
34+
- Related to PR #8
35+
336
## [0.2.7a2] - 2025-11-08 (Alpha)
437

538
### Added

README.md

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# xbooster 🚀
22

3-
A scorecard-format famework for logistic regression tasks with gradient-boosted decision trees (XGBoost and CatBoost).
3+
A scorecard-format framework for logistic regression tasks with gradient-boosted decision trees (XGBoost, LightGBM, and CatBoost).
44
xbooster allows to convert a classification model into a logarithmic (point) scoring system.
55

66
In addition, it provides a suite of interpretability tools to understand the model's behavior.
@@ -218,6 +218,89 @@ The `DataPreprocessor` provides:
218218
3. Generation of interaction constraints for XGBoost
219219
4. Consistent feature naming for scorecard generation
220220

221+
### LightGBM Support 💡 (Release Candidate)
222+
223+
xbooster provides support for LightGBM models with scorecard functionality. Here's how to use it:
224+
225+
```python
226+
import pandas as pd
227+
import lightgbm as lgb
228+
from xbooster.constructor import LGBScorecardConstructor
229+
from sklearn.model_selection import train_test_split
230+
from sklearn.metrics import roc_auc_score
231+
232+
# Load data
233+
url = "https://github.com/xRiskLab/xBooster/raw/main/examples/data/credit_data.parquet"
234+
dataset = pd.read_parquet(url)
235+
236+
features = [
237+
"external_risk_estimate",
238+
"revolving_utilization_of_unsecured_lines",
239+
"account_never_delinq_percent",
240+
"net_fraction_revolving_burden",
241+
"num_total_cc_accounts",
242+
"average_months_in_file",
243+
]
244+
245+
target = "is_bad"
246+
X, y = dataset[features], dataset[target]
247+
248+
X_train, X_test, y_train, y_test = train_test_split(
249+
X, y, test_size=0.3, random_state=62, stratify=y
250+
)
251+
252+
# Train LightGBM model
253+
model = lgb.LGBMClassifier(
254+
n_estimators=50,
255+
learning_rate=0.55,
256+
max_depth=1,
257+
num_leaves=2,
258+
min_child_samples=10,
259+
random_state=62,
260+
verbose=-1,
261+
)
262+
model.fit(X_train, y_train)
263+
264+
# Initialize LGBScorecardConstructor
265+
constructor = LGBScorecardConstructor(model, X_train, y_train)
266+
267+
# Construct scorecard
268+
scorecard = constructor.construct_scorecard()
269+
print(scorecard.head())
270+
271+
# Create points with base score normalization (default)
272+
scorecard_with_points = constructor.create_points(
273+
pdo=50,
274+
target_points=600,
275+
target_odds=19,
276+
precision_points=0,
277+
use_base_score=True # Ensures proper tree contribution balancing
278+
)
279+
280+
# Make predictions
281+
credit_scores = constructor.predict_score(X_test)
282+
283+
# Calculate Gini
284+
gini = roc_auc_score(y_test, -credit_scores) * 2 - 1
285+
print(f"Scorecard Gini: {gini:.4f}")
286+
287+
# Compare with model predictions
288+
model_gini = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) * 2 - 1
289+
print(f"Model Gini: {model_gini:.4f}")
290+
```
291+
292+
**Key Features:**
293+
- **Scorecard Construction**: Implementation of `create_points()` and `predict_score()`
294+
- **Base Score Normalization**: Proper handling of LightGBM's base score for balanced tree contributions
295+
- **High Discrimination**: Scorecard Gini closely matches model Gini
296+
- **Flexible**: `use_base_score` parameter for optional base score normalization
297+
298+
**Important Notes:**
299+
- **Release Candidate**: This feature is in testing phase - feedback welcome!
300+
- LightGBM's sklearn API handles base_score differently than XGBoost
301+
- The `use_base_score=True` parameter (default) ensures proper normalization
302+
- Only `XAddEvidence` score type is supported (WOE not applicable)
303+
221304
### CatBoost Support 🐱 (Beta)
222305

223306
xbooster provides experimental support for CatBoost models with reduced functionality compared to XGBoost. Here's how to use it:

xbooster/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from gradient boosted tree models (XGBoost and CatBoost).
66
"""
77

8-
__version__ = "0.2.7"
8+
__version__ = "0.2.7rc1"
99
__author__ = "xRiskLab"
1010
__email__ = "[email protected]"
1111

0 commit comments

Comments
 (0)