CodeFix Solution

How to Build an NBA Finals Prediction Model in Python (ELO + Pace/ORTG/DRTG)

April 22, 2026 · 14 min read · Python, ELO, NBA, Efficiency Priors, Machine Learning

NBA playoff prediction is different from every other sport in three specific ways: best-of-seven series reduce variance (a better team really does win more often), home-court advantage is huge (~80 ELO points), and advanced efficiency metrics — pace, offensive rating, defensive rating — are available cheaply and matter more than they do in football or baseball.

A production-grade NBA Finals prediction model has to do three things the shortcut versions won't:

  1. Produce calibrated pre-series win probabilities for each playoff matchup — when it says 70%, that team should win the series ~70% of the time
  2. Weight pace, ORTG, and DRTG differentials alongside pure ELO, since NBA style matchups matter
  3. Handle best-of-seven series math correctly by simulating game-by-game with home/away flips

This guide walks through all three in Python using ESPN's free NBA data feed, pandas, xgboost, and scikit-learn. At the end you'll have a pipeline that predicts individual games and the full playoff bracket — including a reproducible 2024-25 NBA Finals backtest we ran and published ourselves.

What You'll Build

Python 3.11+. Deps: pandas, numpy, xgboost, scikit-learn, requests.

Step 1: Pull ESPN NBA Data

import requests, pandas as pd
from datetime import date, timedelta

def fetch_nba_games(date_yyyymmdd: str) -> list[dict]:
    """Pull every NBA game for a date. season_type: 2=regular, 3=postseason."""
    url = "https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard"
    r = requests.get(url, params={"dates": date_yyyymmdd, "limit": 30}, timeout=20)
    r.raise_for_status()
    out = []
    for ev in r.json().get("events", []):
        comp = ev["competitions"][0]
        home = next(t for t in comp["competitors"] if t["homeAway"] == "home")
        away = next(t for t in comp["competitors"] if t["homeAway"] == "away")
        if not (home.get("winner") or away.get("winner")):
            continue
        out.append({
            "game_id": int(ev["id"]),
            "date": date_yyyymmdd,
            "home_team": home["team"]["abbreviation"],
            "away_team": away["team"]["abbreviation"],
            "home_score": int(home.get("score", 0) or 0),
            "away_score": int(away.get("score", 0) or 0),
            "home_won": int(home.get("winner", False)),
            "season_type": ev.get("season", {}).get("type"),
            "notes": [n.get("headline","") for n in comp.get("notes", [])],
        })
    return out

def fetch_nba_season(start: str, end: str) -> pd.DataFrame:
    d0 = date.fromisoformat(start); d1 = date.fromisoformat(end)
    rows = []
    cur = d0
    while cur <= d1:
        try:
            rows.extend(fetch_nba_games(cur.strftime("%Y%m%d")))
        except Exception as e:
            print(f"skip {cur}: {e}")
        cur += timedelta(days=1)
    return pd.DataFrame(rows)

Step 2: NBA-Tuned ELO (K=20, HFA=80)

NBA has a 82-game regular season, which is short enough that ELO needs to move meaningfully after each game (K=20) but long enough that priors stabilize by mid-season. Home-court advantage is strong — historically ~60% home win rate, which is a 80-point ELO bump.

K = 20.0      # NBA learning rate
HFA = 80.0    # NBA home-court advantage in ELO points

def compute_nba_elo(games: pd.DataFrame) -> tuple[dict, dict]:
    """Returns (current_ratings, pre_game_elo_diff_by_game_id — neutral, no HFA)."""
    elo = {}
    game_elo_diff = {}
    games = games.sort_values("game_id").reset_index(drop=True)
    for _, r in games.iterrows():
        h, a = r["home_team"], r["away_team"]
        hs, as_ = r["home_score"], r["away_score"]
        if not h or not a or pd.isna(hs) or pd.isna(as_):
            continue
        he = elo.get(h, 1500.0)
        ae = elo.get(a, 1500.0)
        game_elo_diff[int(r["game_id"])] = he - ae

        expected_h = 1.0 / (1.0 + 10 ** ((ae - he - HFA) / 400.0))
        actual_h = 1.0 if hs > as_ else (0.5 if hs == as_ else 0.0)
        margin = abs(hs - as_)
        # Basketball MoV (FiveThirtyEight formula)
        mov = ((margin + 3) ** 0.8) / (7.5 + 0.006 * max(0, abs(he-ae)))
        mov = max(1.0, min(mov, 2.5))
        delta = K * mov * (actual_h - expected_h)
        elo[h] = he + delta
        elo[a] = ae - delta
    return elo, game_elo_diff

After a full regular season, top teams hit 1750-1800 ELO (OKC was at 1782 entering the 2024-25 playoffs), playoff-bubble teams sit around 1500-1550, and bottom teams drop to 1300-1400.

Step 3: Team Efficiency Priors (Pace, ORTG, DRTG)

ELO is a single-number rating. For NBA matchups you want more: does a fast team beat a slow team? Is a great offense strong enough to overcome a great defense? These are captured by pace, offensive rating (ORTG), and defensive rating (DRTG).

def compute_nba_team_priors(boxscores: pd.DataFrame) -> pd.DataFrame:
    """
    boxscores: one row per team per game with possessions, points_for, points_against.
    Returns rolling priors: pace (poss/48 min), ORTG (pts/100 poss), DRTG (opp pts/100 poss).
    """
    bs = boxscores.sort_values(["team", "game_id"]).copy()
    bs["poss_per_48"] = bs["possessions"] * (48.0 / bs["minutes_played"])
    bs["ortg"] = 100.0 * bs["points_for"] / bs["possessions"]
    bs["drtg"] = 100.0 * bs["points_against"] / bs["possessions"]

    # Rolling mean, expanding window, shifted by 1 so we never see the current game
    for col in ["poss_per_48", "ortg", "drtg"]:
        bs[f"prior_{col}"] = bs.groupby("team")[col].apply(
            lambda s: s.shift(1).expanding().mean()
        ).reset_index(level=0, drop=True)

    return bs[["game_id", "team", "prior_poss_per_48", "prior_ortg", "prior_drtg"]]

The shift is critical. .shift(1).expanding().mean() means on game n, the prior is computed from games 1 through n-1. Never including game n. If you skip the shift, you leak the future into your features and your training ECE looks amazing while your production ECE collapses.

Step 4: Pre-Game Win Probability

Combine ELO and efficiency priors:

def pregame_nba_wp(home_team, away_team, elo, priors_by_team):
    """Pre-game home win probability combining ELO + efficiency diffs."""
    he = elo.get(home_team, 1500.0)
    ae = elo.get(away_team, 1500.0)
    elo_diff = he - ae + HFA

    hp = priors_by_team.get(home_team, {"pace": 100, "ortg": 115, "drtg": 115})
    ap = priors_by_team.get(away_team, {"pace": 100, "ortg": 115, "drtg": 115})

    # Efficiency differentials roughly translate to ELO points at these coefficients
    # (tune via logistic regression on your training set)
    ortg_elo = 3.0 * (hp["ortg"] - ap["ortg"])       # +1 ORTG ≈ +3 ELO
    drtg_elo = -3.0 * (hp["drtg"] - ap["drtg"])      # lower DRTG is better
    pace_elo = 0.5 * (hp["pace"] - ap["pace"])       # pace fit is minor

    total_elo_diff = elo_diff + ortg_elo + drtg_elo + pace_elo
    return 1.0 / (1.0 + 10 ** (-total_elo_diff / 400.0))

Step 5: Upgrade to XGBoost With Calibration

from xgboost import XGBClassifier
from sklearn.isotonic import IsotonicRegression

FEATURES = ["elo_diff", "pace_diff", "ortg_diff", "drtg_diff"]

X = training_df[FEATURES].values
y = training_df["home_won"].values

cutoff = int(0.7 * len(X))
X_train, X_cal = X[:cutoff], X[cutoff:]
y_train, y_cal = y[:cutoff], y[cutoff:]

model = XGBClassifier(
    max_depth=4, learning_rate=0.05, n_estimators=400,
    objective="binary:logistic", eval_metric="logloss"
)
model.fit(X_train, y_train, eval_set=[(X_cal, y_cal)], verbose=False)

iso = IsotonicRegression(out_of_bounds="clip")
iso.fit(model.predict_proba(X_cal)[:, 1], y_cal)

def nba_calibrated_predict(X):
    return iso.transform(model.predict_proba(X)[:, 1])

Step 6: Simulate a Best-of-Seven Series

A series isn't just multiplying game probabilities — home-court shifts between games in a 2-2-1-1-1 format. Monte Carlo it:

import random

def simulate_best_of_seven(high_seed, low_seed, elo, priors, n_sims=20000):
    """
    Standard 2-2-1-1-1: games 1, 2, 5, 7 at high seed; games 3, 4, 6 at low seed.
    Returns high_seed series-win probability.
    """
    home_sequence = [high_seed, high_seed, low_seed, low_seed, high_seed, low_seed, high_seed]
    wins = 0
    for _ in range(n_sims):
        high_wins, low_wins = 0, 0
        for game_i in range(7):
            home = home_sequence[game_i]
            away = low_seed if home == high_seed else high_seed
            p_home = pregame_nba_wp(home, away, elo, priors)
            if random.random() < p_home:
                if home == high_seed: high_wins += 1
                else: low_wins += 1
            else:
                if away == high_seed: high_wins += 1
                else: low_wins += 1
            if high_wins == 4 or low_wins == 4:
                break
        if high_wins == 4:
            wins += 1
    return wins / n_sims

Step 7: Playoff Backtest Harness

def backtest_playoffs(playoff_games, elo_frozen, priors_frozen, model, iso):
    preds = []
    for _, g in playoff_games.iterrows():
        he = elo_frozen.get(g["home_team"], 1500.0)
        ae = elo_frozen.get(g["away_team"], 1500.0)
        hp = priors_frozen.get(g["home_team"], {"pace": 100, "ortg": 115, "drtg": 115})
        ap = priors_frozen.get(g["away_team"], {"pace": 100, "ortg": 115, "drtg": 115})
        features = np.array([[
            he - ae + HFA,
            hp["pace"] - ap["pace"],
            hp["ortg"] - ap["ortg"],
            hp["drtg"] - ap["drtg"],
        ]])
        p = nba_calibrated_predict(features)[0]
        preds.append({**g.to_dict(), "model_wp": p,
                     "correct": int((p >= 0.5) == g["home_won"])})
    return pd.DataFrame(preds)

What we got: Our NBA model on the 2024-25 playoffs (84 games including a 7-game Finals) hit 50 of 84 (59.5%). It called every OKC home Finals game correctly at 86.8% confidence. Full breakdown and all 7 Finals predictions are in the public retrospective.

Step 8: Measure Expected Calibration Error

def ece(y_pred, y_true, n_bins=10):
    bins = np.linspace(0, 1, n_bins + 1)
    total = 0.0
    for i in range(n_bins):
        mask = (y_pred >= bins[i]) & (y_pred < bins[i+1])
        if i == n_bins - 1:
            mask = (y_pred >= bins[i]) & (y_pred <= bins[i+1])
        if mask.sum() == 0:
            continue
        gap = abs(y_pred[mask].mean() - y_true[mask].mean())
        total += gap * (mask.sum() / len(y_pred))
    return total

Target for NBA: ECE below 5% on full regular season. Playoff-only ECE is always noisier because of small sample (15-20 series per year).

Common Mistakes That Kill an NBA Model

Using uniform home-court advantage

Standard NBA ELO uses 80 points for every team. But some teams have much stronger home-court effects (Denver's altitude, Golden State's noise) and some are neutral. If you want top-tier calibration, fit a team-specific HFA from regular-season home/away win differentials.

Ignoring back-to-backs

NBA teams play back-to-back games frequently in the regular season. A team on a B2B has a measurable disadvantage (~20 ELO points). Not modeling this leaves regular-season accuracy on the table.

Not excluding playoff games from ELO training

Freeze ELO at the regular-season cutoff. If your ELO updates include playoff games you're trying to predict, you've leaked the future.

Calibrating on training data

The universal mistake. Always calibrate on a held-out slice that the base model never saw during training.

Measuring accuracy but not ECE

A 65% accurate NBA model with 15% ECE is less useful than a 60% accurate model with 3% ECE. ECE determines whether your 70% calls actually win 70% of the time — which is what you need for Kelly sizing.

Want to skip building this yourself?

ZenHodl's API gives you pre-built, pre-calibrated NBA win probabilities, plus historical snapshots for backtesting your own playoff strategies. 7-day free trial, no credit card.

Get API access →

Summary

A production-grade NBA Finals prediction model is NBA-tuned ELO + team efficiency priors + XGBoost + isotonic calibration + a Monte Carlo series simulator. The hard parts are: excluding playoff games from ELO training, using the .shift(1).expanding() pattern so priors don't leak the future, and honestly measuring ECE instead of just accuracy.

Next tutorial: how to build a Super Bowl prediction model — same discipline, different sport.

Related Reading