NFL SuperBowl Squares Historic Results

If you’ve bought SuperBowl squares this year, you may be wondering… Are my numbers any good? 7 is good right? 3? What about the rest?

I’ve looked at the last 3 years of football scores and calculated the distribution, with Python, here’s your chances:

NFLSquaresProbabilities

Data

The data behind this is the NFL scores from the 2017, 2018, and 2019 seasons, 800 games in total. Including playoffs, not including preseason games. The data is easy enough to find on the internet.

Note There’s arguably no true home field advantage, however this analysis uses regular season scores where there is a home field advantage.

Note 2 This is a distribution of actual game scores which assumes the home-visitor scores within a game are correlated. It is not a distribution of all the home team scores vs all the visitor scores. For example, in the last 3 seasons there was never a home-visitor combination of 2-2, so the probability is 0%.

Code

The data processing and calculations were done in Python. Let’s do it:

import pandas as pd

df = pd.read_csv('NFLScores.csv')
df.info()

This will give us the following datatypes:

1_initial_datatypes

Though Visitor_Score and Home_Score are numeric, for squares we need to parse the right most digit, so let’s re-import them as strings (aka objects).

df = pd.read_csv('NFLScores.csv', dtype=object)
df.info()

Much better:

2_final_datatypes

Let’s capture the total number of games, which we’ll use later in our probability calculations:

total_games = df.shape[0]

Let’s parse the last digit of the visitor’s score:

df['Visitor_Score'].str[-1:]

3_last_digit

With the last digit successfully parsed, let’s establish new columns in our dataframe to hold the visitor and home last digits, named Visitor_Square and Home_Square:

df['Visitor_Square'] = df['Visitor_Score'].str[-1:]
df['Home_Square'] = df['Home_Score'].str[-1:]

Let’s create a final dataframe with 0-9 rows and 0-9 columns to hold each probability that corresponds to those digits:

df_squares = pd.DataFrame(0, index=[0,1,2,3,4,5,6,7,8,9], columns=[0,1,2,3,4,5,6,7,8,9], dtype=float)

Now, we loop through our data and increment the corresponding cell in df_squares, note that I’m incrementing with respect to the total_games value:

for i in range(len(df)):
    visitor_index = int(df.iloc[i]['Visitor_Square'])
    home_index = int(df.iloc[i]['Home_Square'])

    df_squares.at[visitor_index,home_index] = df_squares.at[visitor_index,home_index] + (1 / total_games)

df_squares

5_final_probabilities

We have our probabilities, let’s make it a little more human readable by multiply by 100:

df_squares = df_squares * 100

And let’s make it look a little nicer using Seaborn:

import seaborn as sns
import matplotlib.pyplot as plt

# Multiply probabilities by 100 to get more aesthetic percentages
df_squares = df_squares * 100

fig, ax = plt.subplots(figsize=(14,7))  
sns.heatmap(df_squares, cmap="coolwarm", annot=True, cbar=False, annot_kws={"fontsize":16})

plt.xlabel('Home Number', fontsize=12)
ax.xaxis.set_ticks_position('top')
ax.xaxis.set_label_position('top')


plt.ylabel('Away Number', fontsize=12)

# Fix broken issue
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

fig.suptitle('SuperBowl Squares Probabilities', fontsize=20)

NFLSquaresProbabilities

Cheers!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.