If you’ve bought SuperBowl squares this year, you may be wondering… Are my numbers any good? 7 is good right? 3? What about the rest?

I’ve looked at the last 3 years of football scores and calculated the distribution, with Python, here’s your chances:

### Data

The data behind this is the NFL scores from the 2017, 2018, and 2019 seasons, 800 games in total. Including playoffs, not including preseason games. The data is easy enough to find on the internet.

**Note** There’s arguably no true home field advantage, however this analysis uses regular season scores where there is a home field advantage.

**Note 2** This is a distribution of actual game scores which assumes the home-visitor scores within a game are correlated. It is not a distribution of all the home team scores vs all the visitor scores. For example, in the last 3 seasons there was never a home-visitor combination of 2-2, so the probability is 0%.

### Code

The data processing and calculations were done in Python. Let’s do it:

import pandas as pd df = pd.read_csv('NFLScores.csv') df.info()

This will give us the following datatypes:

Though *Visitor_Score* and *Home_Score* are numeric, for squares we need to parse the right most digit, so let’s re-import them as strings (aka objects).

df = pd.read_csv('NFLScores.csv', dtype=object) df.info()

Much better:

Let’s capture the total number of games, which we’ll use later in our probability calculations:

total_games = df.shape[0]

Let’s parse the last digit of the visitor’s score:

df['Visitor_Score'].str[-1:]

With the last digit successfully parsed, let’s establish new columns in our dataframe to hold the visitor and home last digits, named *Visitor_Square* and *Home_Square*:

df['Visitor_Square'] = df['Visitor_Score'].str[-1:] df['Home_Square'] = df['Home_Score'].str[-1:]

Let’s create a final dataframe with 0-9 rows and 0-9 columns to hold each probability that corresponds to those digits:

df_squares = pd.DataFrame(0, index=[0,1,2,3,4,5,6,7,8,9], columns=[0,1,2,3,4,5,6,7,8,9], dtype=float)

Now, we loop through our data and increment the corresponding cell in *df_squares*, note that I’m incrementing with respect to the total_games value:

for i in range(len(df)): visitor_index = int(df.iloc[i]['Visitor_Square']) home_index = int(df.iloc[i]['Home_Square']) df_squares.at[visitor_index,home_index] = df_squares.at[visitor_index,home_index] + (1 / total_games) df_squares

We have our probabilities, let’s make it a little more human readable by multiply by 100:

df_squares = df_squares * 100

And let’s make it look a little nicer using Seaborn:

import seaborn as sns import matplotlib.pyplot as plt # Multiply probabilities by 100 to get more aesthetic percentages df_squares = df_squares * 100 fig, ax = plt.subplots(figsize=(14,7)) sns.heatmap(df_squares, cmap="coolwarm", annot=True, cbar=False, annot_kws={"fontsize":16}) plt.xlabel('Home Number', fontsize=12) ax.xaxis.set_ticks_position('top') ax.xaxis.set_label_position('top') plt.ylabel('Away Number', fontsize=12) # Fix broken issue bottom, top = ax.get_ylim() ax.set_ylim(bottom + 0.5, top - 0.5) fig.suptitle('SuperBowl Squares Probabilities', fontsize=20)

Cheers!