Sample Weights in XGBoost: Fixing Class Imbalance and Regime Drift
You've trained an XGBoost classifier. Accuracy looks good. AUC is 0.87. You deploy, and the model is wildly miscalibrated on the rare-event bucket that actually matters. You try hyperparameters; nothing fixes it.
What you're missing is sample_weight — XGBoost's single most underused parameter. It tells the model which rows should count more during gradient computation. Used correctly, it fixes class imbalance, emphasizes recent data, handles regime-specific drift, and boosts calibration on rare-but-critical examples.
This post walks through the four cases where sample_weight is the right tool, with copy-pasteable Python code for each.
The Four Cases for sample_weight
| Case | What you weight heavier | Typical weight range |
|---|---|---|
| Class imbalance | Minority class rows | 2-10x |
| Recency bias | Recent rows | 1-5x |
| Regime emphasis | Rows matching target regime (playoffs, etc.) | 2-5x |
| Cost-weighted learning | High-stakes rows | 2-20x |
Case 1: Fixing Class Imbalance
Training a model on MLB moneyline outcomes? ~53% of home teams win. NBA? ~58%. Not too bad. But fraud detection, conversion prediction, rare-disease classification? You might have 1-5% positives. Without rebalancing, the model learns "always predict 0" and hits 98% accuracy.
The standard fix: inverse-frequency weights
import numpy as np
def balanced_sample_weights(y):
"""Weight each class inversely proportional to its frequency."""
class_counts = np.bincount(y)
n_classes = len(class_counts)
weights = len(y) / (n_classes * class_counts)
return weights[y]
# Training
y_train = np.array([0, 0, 0, 0, 0, 1, 1]) # 5:2 imbalance
weights = balanced_sample_weights(y_train)
# weights -> [0.7, 0.7, 0.7, 0.7, 0.7, 1.75, 1.75]
model = XGBClassifier(n_estimators=300, max_depth=4)
model.fit(X_train, y_train, sample_weight=weights)
XGBoost also has scale_pos_weight which does the same thing but only for binary classification. Use scale_pos_weight = n_negative / n_positive. For multi-class, you must use sample_weight.
sample_weight for imbalance.
Case 2: Emphasizing Recent Data
Sports, finance, ads — any time-series domain has distribution drift. Rules change. Team rosters change. Market structure evolves. A 2021 game and a 2026 game are not from the same distribution, even though they're both labeled "NBA regular season."
Solution: decay weights exponentially by row age.
from datetime import datetime
def recency_weights(dates: np.ndarray, half_life_days: float = 365):
"""Exponential decay: rows half_life_days old get half the weight."""
today = datetime.now()
ages_days = np.array([(today - d).days for d in dates])
decay_rate = np.log(2) / half_life_days
return np.exp(-decay_rate * ages_days)
# Example
import pandas as pd
df['w'] = recency_weights(df['date'], half_life_days=730) # 2-year half-life
model.fit(df[feature_cols], df['y'], sample_weight=df['w'].values)
A 2-year half-life means 2024 data counts at full weight, 2022 data at 50%, 2020 at 25%. Adjust based on how fast your domain drifts. Trading markets: 6-12 months. Sports: 1-2 years. Medical imaging: longer.
Case 3: Regime Emphasis (Playoffs, Early Season, Weather)
This is where sample_weight becomes a precision tool. You have data across multiple regimes, but you trade primarily in one of them.
Example: emphasizing NBA playoffs
# Assume df has an is_playoff column (1 = playoff game, 0 = regular season)
# Playoffs are ~6% of games historically; we want them weighted 3x during training
df['sample_weight'] = np.where(df['is_playoff'] == 1, 3.0, 1.0)
# Effective sample: 94% of regular + 6% * 3 = 18% playoff "effective mass"
# Before weighting: 6% playoff. After: 18% effective playoff mass.
# The model now pays 3x as much attention to playoff patterns.
model.fit(
df[feature_cols],
df['home_wins'],
sample_weight=df['sample_weight'].values,
)
After one real-world case where our NBA live model was badly miscalibrated on playoff games (84% predicted → 44% actual in the 80-90% confidence bucket), weighting playoff rows 3x during training dropped playoff Expected Calibration Error from 21% to 5.5%. Regular-season ECE barely moved.
Computing regime indicators
For NBA/NHL, the regime indicator is "was this game a playoff game?" Derived from game dates against the league's playoff window:
def is_playoff_nba(date) -> int:
"""NBA playoffs: roughly April 20 - June 22 each year."""
if not (4 <= date.month <= 6):
return 0
if date.month == 4 and date.day < 18:
return 0
if date.month == 6 and date.day > 22:
return 0
return 1
df['is_playoff'] = df['date'].apply(is_playoff_nba)
For ESPN game data, a more reliable method is to hit the ESPN scoreboard endpoint with dates=YYYYMMDD and check each event's season.type field (2 = regular, 3 = postseason).
Case 4: Cost-Weighted Learning
Some errors cost more than others. Misclassifying a fraud transaction costs 100x more than misclassifying a legit one. In trading, misclassifying a high-confidence trade costs 10x more than missing a small edge.
Use sample_weight proportional to each row's cost-weight:
# Fraud detection: weight positive class by transaction amount
df.loc[df['is_fraud'] == 1, 'w'] = df['amount'] # cost = $ at risk
df.loc[df['is_fraud'] == 0, 'w'] = 1.0
model.fit(X, y, sample_weight=df['w'].values)
This turns the classifier into a cost-sensitive learner: it minimizes expected dollar-weighted loss, not raw classification error.
Combining Multiple Weights
You often want multiple corrections simultaneously. Multiply the weights:
df['w_imbalance'] = balanced_sample_weights(df['y'].values)
df['w_recency'] = recency_weights(df['date'].values, half_life_days=730)
df['w_regime'] = np.where(df['is_playoff'] == 1, 3.0, 1.0)
df['w'] = df['w_imbalance'] * df['w_recency'] * df['w_regime']
model.fit(X_train, df['y'], sample_weight=df['w'].values)
Keep the final weights < ~100x and > ~0.01x. Extreme weights cause XGBoost to effectively ignore the low-weighted rows, which is rarely what you want.
Evaluating Weighted Models
A model trained with sample_weight will score differently than one without. Some gotchas:
- Evaluate on unweighted data. Your test set should reflect the real-world distribution, not the rebalanced training distribution.
- Measure calibration separately per regime. If you weighted playoffs 3x, check ECE on playoffs AND regular season. A model calibrated on both is better than a model calibrated on one and miscalibrated on the other.
- Always apply post-hoc calibration after using sample_weight for imbalance or regime emphasis. Isotonic regression on the unweighted validation set.
from sklearn.isotonic import IsotonicRegression
# Train with weights
base_model.fit(X_train, y_train, sample_weight=weights_train)
# Calibrate on unweighted validation set
cal_raw = base_model.predict_proba(X_val)[:, 1]
iso = IsotonicRegression(y_min=0.01, y_max=0.99, out_of_bounds="clip")
iso.fit(cal_raw, y_val) # <- no weights here
# At inference
def calibrated_predict(X):
raw = base_model.predict_proba(X)[:, 1]
return iso.predict(raw)
When sample_weight Is the Wrong Answer
Three situations where sample_weight is the wrong move:
- Tiny minority class (< 50 examples total). No amount of weighting fixes "not enough data." Collect more or accept the limitation.
- Data quality issue on rare class. If your rare-class labels are noisy, weighting amplifies the noise. Fix the labels first.
- Testing a structural hypothesis. If you want to know whether playoff games behave differently, fit two separate models — one on playoffs, one on regular season — and compare. Single-model weighting blends regimes.
Want to skip the training pipeline? ZenHodl's API ships playoff-aware, calibrated sports predictions across 11 sports — sample-weighted, isotonic-calibrated, production-ready.
See ZenHodlFurther reading: Calibrating XGBoost Probabilities with Isotonic Regression · 15 Features That Matter for Sports Win Probability
Related Reading
- NCAAMB 2025-26 Season Report — sample-weighted training applied to 5,345 games.
- Build a March Madness prediction model — where sample weights emphasize tournament games.
- Build an MLB prediction model — sample weighting for playoff emphasis.
- Calibrating XGBoost probabilities with isotonic regression — post-training calibration.
- From Jupyter to production ML API — how to ship a weighted model.