How to Build an NBA Finals Prediction Model in Python (ELO + Pace/ORTG/DRTG)
NBA playoff prediction is different from every other sport in three specific ways: best-of-seven series reduce variance (a better team really does win more often), home-court advantage is huge (~80 ELO points), and advanced efficiency metrics — pace, offensive rating, defensive rating — are available cheaply and matter more than they do in football or baseball.
A production-grade NBA Finals prediction model has to do three things the shortcut versions won't:
- Produce calibrated pre-series win probabilities for each playoff matchup — when it says 70%, that team should win the series ~70% of the time
- Weight pace, ORTG, and DRTG differentials alongside pure ELO, since NBA style matchups matter
- Handle best-of-seven series math correctly by simulating game-by-game with home/away flips
This guide walks through all three in Python using ESPN's free NBA data feed, pandas, xgboost, and scikit-learn. At the end you'll have a pipeline that predicts individual games and the full playoff bracket — including a reproducible 2024-25 NBA Finals backtest we ran and published ourselves.
What You'll Build
- ESPN NBA data loader for any season (regular + playoffs)
- NBA-tuned ELO (K=20, HFA=80) with basketball-style margin-of-victory scaling
- Team efficiency priors: rolling-to-date pace, ORTG, DRTG
- Pre-game win probability function that combines ELO + efficiency differentials
- Best-of-seven series simulator (Monte Carlo, 10k+ trials)
- Playoff backtest harness with no look-ahead bias
- ECE calibration measurement
Python 3.11+. Deps: pandas, numpy, xgboost, scikit-learn, requests.
Step 1: Pull ESPN NBA Data
import requests, pandas as pd
from datetime import date, timedelta
def fetch_nba_games(date_yyyymmdd: str) -> list[dict]:
"""Pull every NBA game for a date. season_type: 2=regular, 3=postseason."""
url = "https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard"
r = requests.get(url, params={"dates": date_yyyymmdd, "limit": 30}, timeout=20)
r.raise_for_status()
out = []
for ev in r.json().get("events", []):
comp = ev["competitions"][0]
home = next(t for t in comp["competitors"] if t["homeAway"] == "home")
away = next(t for t in comp["competitors"] if t["homeAway"] == "away")
if not (home.get("winner") or away.get("winner")):
continue
out.append({
"game_id": int(ev["id"]),
"date": date_yyyymmdd,
"home_team": home["team"]["abbreviation"],
"away_team": away["team"]["abbreviation"],
"home_score": int(home.get("score", 0) or 0),
"away_score": int(away.get("score", 0) or 0),
"home_won": int(home.get("winner", False)),
"season_type": ev.get("season", {}).get("type"),
"notes": [n.get("headline","") for n in comp.get("notes", [])],
})
return out
def fetch_nba_season(start: str, end: str) -> pd.DataFrame:
d0 = date.fromisoformat(start); d1 = date.fromisoformat(end)
rows = []
cur = d0
while cur <= d1:
try:
rows.extend(fetch_nba_games(cur.strftime("%Y%m%d")))
except Exception as e:
print(f"skip {cur}: {e}")
cur += timedelta(days=1)
return pd.DataFrame(rows)
Step 2: NBA-Tuned ELO (K=20, HFA=80)
NBA has a 82-game regular season, which is short enough that ELO needs to move meaningfully after each game (K=20) but long enough that priors stabilize by mid-season. Home-court advantage is strong — historically ~60% home win rate, which is a 80-point ELO bump.
K = 20.0 # NBA learning rate
HFA = 80.0 # NBA home-court advantage in ELO points
def compute_nba_elo(games: pd.DataFrame) -> tuple[dict, dict]:
"""Returns (current_ratings, pre_game_elo_diff_by_game_id — neutral, no HFA)."""
elo = {}
game_elo_diff = {}
games = games.sort_values("game_id").reset_index(drop=True)
for _, r in games.iterrows():
h, a = r["home_team"], r["away_team"]
hs, as_ = r["home_score"], r["away_score"]
if not h or not a or pd.isna(hs) or pd.isna(as_):
continue
he = elo.get(h, 1500.0)
ae = elo.get(a, 1500.0)
game_elo_diff[int(r["game_id"])] = he - ae
expected_h = 1.0 / (1.0 + 10 ** ((ae - he - HFA) / 400.0))
actual_h = 1.0 if hs > as_ else (0.5 if hs == as_ else 0.0)
margin = abs(hs - as_)
# Basketball MoV (FiveThirtyEight formula)
mov = ((margin + 3) ** 0.8) / (7.5 + 0.006 * max(0, abs(he-ae)))
mov = max(1.0, min(mov, 2.5))
delta = K * mov * (actual_h - expected_h)
elo[h] = he + delta
elo[a] = ae - delta
return elo, game_elo_diff
After a full regular season, top teams hit 1750-1800 ELO (OKC was at 1782 entering the 2024-25 playoffs), playoff-bubble teams sit around 1500-1550, and bottom teams drop to 1300-1400.
Step 3: Team Efficiency Priors (Pace, ORTG, DRTG)
ELO is a single-number rating. For NBA matchups you want more: does a fast team beat a slow team? Is a great offense strong enough to overcome a great defense? These are captured by pace, offensive rating (ORTG), and defensive rating (DRTG).
def compute_nba_team_priors(boxscores: pd.DataFrame) -> pd.DataFrame:
"""
boxscores: one row per team per game with possessions, points_for, points_against.
Returns rolling priors: pace (poss/48 min), ORTG (pts/100 poss), DRTG (opp pts/100 poss).
"""
bs = boxscores.sort_values(["team", "game_id"]).copy()
bs["poss_per_48"] = bs["possessions"] * (48.0 / bs["minutes_played"])
bs["ortg"] = 100.0 * bs["points_for"] / bs["possessions"]
bs["drtg"] = 100.0 * bs["points_against"] / bs["possessions"]
# Rolling mean, expanding window, shifted by 1 so we never see the current game
for col in ["poss_per_48", "ortg", "drtg"]:
bs[f"prior_{col}"] = bs.groupby("team")[col].apply(
lambda s: s.shift(1).expanding().mean()
).reset_index(level=0, drop=True)
return bs[["game_id", "team", "prior_poss_per_48", "prior_ortg", "prior_drtg"]]
The shift is critical.
.shift(1).expanding().mean()means on game n, the prior is computed from games 1 through n-1. Never including game n. If you skip the shift, you leak the future into your features and your training ECE looks amazing while your production ECE collapses.
Step 4: Pre-Game Win Probability
Combine ELO and efficiency priors:
def pregame_nba_wp(home_team, away_team, elo, priors_by_team):
"""Pre-game home win probability combining ELO + efficiency diffs."""
he = elo.get(home_team, 1500.0)
ae = elo.get(away_team, 1500.0)
elo_diff = he - ae + HFA
hp = priors_by_team.get(home_team, {"pace": 100, "ortg": 115, "drtg": 115})
ap = priors_by_team.get(away_team, {"pace": 100, "ortg": 115, "drtg": 115})
# Efficiency differentials roughly translate to ELO points at these coefficients
# (tune via logistic regression on your training set)
ortg_elo = 3.0 * (hp["ortg"] - ap["ortg"]) # +1 ORTG ≈ +3 ELO
drtg_elo = -3.0 * (hp["drtg"] - ap["drtg"]) # lower DRTG is better
pace_elo = 0.5 * (hp["pace"] - ap["pace"]) # pace fit is minor
total_elo_diff = elo_diff + ortg_elo + drtg_elo + pace_elo
return 1.0 / (1.0 + 10 ** (-total_elo_diff / 400.0))
Step 5: Upgrade to XGBoost With Calibration
from xgboost import XGBClassifier
from sklearn.isotonic import IsotonicRegression
FEATURES = ["elo_diff", "pace_diff", "ortg_diff", "drtg_diff"]
X = training_df[FEATURES].values
y = training_df["home_won"].values
cutoff = int(0.7 * len(X))
X_train, X_cal = X[:cutoff], X[cutoff:]
y_train, y_cal = y[:cutoff], y[cutoff:]
model = XGBClassifier(
max_depth=4, learning_rate=0.05, n_estimators=400,
objective="binary:logistic", eval_metric="logloss"
)
model.fit(X_train, y_train, eval_set=[(X_cal, y_cal)], verbose=False)
iso = IsotonicRegression(out_of_bounds="clip")
iso.fit(model.predict_proba(X_cal)[:, 1], y_cal)
def nba_calibrated_predict(X):
return iso.transform(model.predict_proba(X)[:, 1])
Step 6: Simulate a Best-of-Seven Series
A series isn't just multiplying game probabilities — home-court shifts between games in a 2-2-1-1-1 format. Monte Carlo it:
import random
def simulate_best_of_seven(high_seed, low_seed, elo, priors, n_sims=20000):
"""
Standard 2-2-1-1-1: games 1, 2, 5, 7 at high seed; games 3, 4, 6 at low seed.
Returns high_seed series-win probability.
"""
home_sequence = [high_seed, high_seed, low_seed, low_seed, high_seed, low_seed, high_seed]
wins = 0
for _ in range(n_sims):
high_wins, low_wins = 0, 0
for game_i in range(7):
home = home_sequence[game_i]
away = low_seed if home == high_seed else high_seed
p_home = pregame_nba_wp(home, away, elo, priors)
if random.random() < p_home:
if home == high_seed: high_wins += 1
else: low_wins += 1
else:
if away == high_seed: high_wins += 1
else: low_wins += 1
if high_wins == 4 or low_wins == 4:
break
if high_wins == 4:
wins += 1
return wins / n_sims
Step 7: Playoff Backtest Harness
def backtest_playoffs(playoff_games, elo_frozen, priors_frozen, model, iso):
preds = []
for _, g in playoff_games.iterrows():
he = elo_frozen.get(g["home_team"], 1500.0)
ae = elo_frozen.get(g["away_team"], 1500.0)
hp = priors_frozen.get(g["home_team"], {"pace": 100, "ortg": 115, "drtg": 115})
ap = priors_frozen.get(g["away_team"], {"pace": 100, "ortg": 115, "drtg": 115})
features = np.array([[
he - ae + HFA,
hp["pace"] - ap["pace"],
hp["ortg"] - ap["ortg"],
hp["drtg"] - ap["drtg"],
]])
p = nba_calibrated_predict(features)[0]
preds.append({**g.to_dict(), "model_wp": p,
"correct": int((p >= 0.5) == g["home_won"])})
return pd.DataFrame(preds)
What we got: Our NBA model on the 2024-25 playoffs (84 games including a 7-game Finals) hit 50 of 84 (59.5%). It called every OKC home Finals game correctly at 86.8% confidence. Full breakdown and all 7 Finals predictions are in the public retrospective.
Step 8: Measure Expected Calibration Error
def ece(y_pred, y_true, n_bins=10):
bins = np.linspace(0, 1, n_bins + 1)
total = 0.0
for i in range(n_bins):
mask = (y_pred >= bins[i]) & (y_pred < bins[i+1])
if i == n_bins - 1:
mask = (y_pred >= bins[i]) & (y_pred <= bins[i+1])
if mask.sum() == 0:
continue
gap = abs(y_pred[mask].mean() - y_true[mask].mean())
total += gap * (mask.sum() / len(y_pred))
return total
Target for NBA: ECE below 5% on full regular season. Playoff-only ECE is always noisier because of small sample (15-20 series per year).
Common Mistakes That Kill an NBA Model
Using uniform home-court advantage
Standard NBA ELO uses 80 points for every team. But some teams have much stronger home-court effects (Denver's altitude, Golden State's noise) and some are neutral. If you want top-tier calibration, fit a team-specific HFA from regular-season home/away win differentials.
Ignoring back-to-backs
NBA teams play back-to-back games frequently in the regular season. A team on a B2B has a measurable disadvantage (~20 ELO points). Not modeling this leaves regular-season accuracy on the table.
Not excluding playoff games from ELO training
Freeze ELO at the regular-season cutoff. If your ELO updates include playoff games you're trying to predict, you've leaked the future.
Calibrating on training data
The universal mistake. Always calibrate on a held-out slice that the base model never saw during training.
Measuring accuracy but not ECE
A 65% accurate NBA model with 15% ECE is less useful than a 60% accurate model with 3% ECE. ECE determines whether your 70% calls actually win 70% of the time — which is what you need for Kelly sizing.
Want to skip building this yourself?
ZenHodl's API gives you pre-built, pre-calibrated NBA win probabilities, plus historical snapshots for backtesting your own playoff strategies. 7-day free trial, no credit card.
Get API access →Summary
A production-grade NBA Finals prediction model is NBA-tuned ELO + team efficiency priors + XGBoost + isotonic calibration + a Monte Carlo series simulator. The hard parts are: excluding playoff games from ELO training, using the .shift(1).expanding() pattern so priors don't leak the future, and honestly measuring ECE instead of just accuracy.
Next tutorial: how to build a Super Bowl prediction model — same discipline, different sport.
Related Reading
- Our NBA 2024-25 Playoff Retrospective — 50/84 (59.5%) including OKC's championship run correctly called.
- Build a March Madness prediction model — the college basketball equivalent tutorial.
- March Madness 2026 Backtest — the college basketball playoff companion (71.6% on 67 games).
- Build an NHL Stanley Cup prediction model — sibling playoff-bracket sport with similar Monte Carlo discipline.
- Feature engineering for sports win probability — deep dive on the 15 features.