CodeFix Solution

Feature Engineering for Sports Betting Models: What Actually Moves Win Probability

May 11, 2026 · 13 min read · Python, Feature Engineering, XGBoost, SHAP

You can throw 100 features at an XGBoost model and it will use most of them. The question is which ones actually carry signal versus which ones are noise the model pretends to use. This post walks through the features that consistently move win probability across NBA, NHL, MLB, and football — based on SHAP attribution from production models with thousands of trained games each.

The conclusion up front: the right feature set is small, mostly the same across sports, and dominated by a handful of features that the model relies on disproportionately.

The features that always matter

Five features carry the bulk of the signal in every sport win-probability model we have built:

FeatureWhy it matters
Score differential (home - away)The single largest signal in any in-play model. Late in the game, score diff dominates everything else.
Time remaining (normalized)How much of the game is left for the score to change.
Pre-game win probabilityFrom your team-strength model (Elo, KenPom, FiveThirtyEight). Carries everything the model knows before the game starts.
Score diff × time remainingInteraction term. A 5-point lead with 2 minutes left is very different from a 5-point lead with 30 minutes left.
Possession indicator (where applicable)In basketball and football, who has the ball is critical near the end of close games.

These five features alone get you a model that is 90% as good as a 30-feature one. Everything beyond is incremental.

Sport-specific features that meaningfully add

For each sport, two or three sport-specific features add real value beyond the universal five.

Basketball (NBA, NCAAMB)

Hockey (NHL)

Baseball (MLB)

Football (NFL, CFB)

The interaction features that punch above their weight

A handful of engineered interaction features show up in the top-10 SHAP importance for almost every sport:

df["score_diff_x_time_remaining"] = df["score_diff"] * df["time_remaining"]
df["score_diff_sq"] = df["score_diff"] ** 2
df["score_diff_x_elo_diff"] = df["score_diff"] * df["elo_diff"]
df["pregame_wp_x_time_elapsed"] = df["pregame_wp"] * (1 - df["time_remaining"])

The first three add 1-3 percentage points of model AUC over the raw features in our backtests. The fourth captures the way the model should weight pregame information differently early vs late in a game.

The features that look important but are not

Several features that intuitively seem like they should matter contribute almost nothing in practice:

A model that includes these features will use them, but the SHAP attribution shows they contribute fractions of a percentage point to AUC. Cut them and the model performs as well or better.

How to find your own important features

Use SHAP on a trained model:

import shap
import xgboost as xgb

model = xgb.XGBClassifier(n_estimators=400, max_depth=5, learning_rate=0.05)
model.fit(X_train, y_train)

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_train)

# Mean absolute SHAP per feature
import numpy as np
import pandas as pd

mean_abs_shap = np.abs(shap_values).mean(axis=0)
shap_df = pd.DataFrame({
    "feature": X_train.columns,
    "mean_abs_shap": mean_abs_shap,
}).sort_values("mean_abs_shap", ascending=False)
print(shap_df.head(15))

Read the top 10. Drop everything below position 20. Retrain. Compare AUC. The right feature set is usually 12-15 features, not 50.

Feature stability across seasons

The features that matter in February usually matter in October. The features that matter in 2025 usually matter in 2024. Win-probability features are remarkably stable across seasons because the underlying physics of the games does not change.

The exceptions: rule changes (NBA's 2-minute review rules, NCAA's 30-second shot clock changes) can shift feature importance over the course of one season. Pace changes (the NBA's pace increased significantly between 2010 and 2020) shift the relative importance of pace-related features over multiple seasons.

The practical implication: retraining the model annually is enough to capture most drift. Quarterly is overkill. Monthly is unnecessary unless you have a specific reason.

The bottom line

Feature engineering for sports models is mostly subtraction, not addition. Start with the universal five (score diff, time, pregame WP, score-time interaction, possession), add two or three sport-specific features, engineer a handful of interaction terms, and stop. Anything beyond is usually noise the model pretends to learn from.

Production win-probability models for 11 sports

ZenHodl publishes calibrated probabilities using the feature sets described in this post. Free seven-day trial.

Try ZenHodl free