# NFL SuperBowl Squares Historic Results

If you’ve bought SuperBowl squares this year, you may be wondering… Are my numbers any good? 7 is good right? 3? What about the rest?

I’ve looked at the last 3 years of football scores and calculated the distribution, with Python, here’s your chances: ### Data

The data behind this is the NFL scores from the 2017, 2018, and 2019 seasons, 800 games in total. Including playoffs, not including preseason games. The data is easy enough to find on the internet.

Note There’s arguably no true home field advantage, however this analysis uses regular season scores where there is a home field advantage.

Note 2 This is a distribution of actual game scores which assumes the home-visitor scores within a game are correlated. It is not a distribution of all the home team scores vs all the visitor scores. For example, in the last 3 seasons there was never a home-visitor combination of 2-2, so the probability is 0%.

### Code

The data processing and calculations were done in Python. Let’s do it:

```import pandas as pd

df.info()
```

This will give us the following datatypes: Though Visitor_Score and Home_Score are numeric, for squares we need to parse the right most digit, so let’s re-import them as strings (aka objects).

```df = pd.read_csv('NFLScores.csv', dtype=object)
df.info()
```

Much better: Let’s capture the total number of games, which we’ll use later in our probability calculations:

```total_games = df.shape
```

Let’s parse the last digit of the visitor’s score:

```df['Visitor_Score'].str[-1:]
``` With the last digit successfully parsed, let’s establish new columns in our dataframe to hold the visitor and home last digits, named Visitor_Square and Home_Square:

```df['Visitor_Square'] = df['Visitor_Score'].str[-1:]
df['Home_Square'] = df['Home_Score'].str[-1:]
```

Let’s create a final dataframe with 0-9 rows and 0-9 columns to hold each probability that corresponds to those digits:

```df_squares = pd.DataFrame(0, index=[0,1,2,3,4,5,6,7,8,9], columns=[0,1,2,3,4,5,6,7,8,9], dtype=float)
```

Now, we loop through our data and increment the corresponding cell in df_squares, note that I’m incrementing with respect to the total_games value:

```for i in range(len(df)):
visitor_index = int(df.iloc[i]['Visitor_Square'])
home_index = int(df.iloc[i]['Home_Square'])

df_squares.at[visitor_index,home_index] = df_squares.at[visitor_index,home_index] + (1 / total_games)

df_squares
``` We have our probabilities, let’s make it a little more human readable by multiply by 100:

```df_squares = df_squares * 100
```

And let’s make it look a little nicer using Seaborn:

```import seaborn as sns
import matplotlib.pyplot as plt

# Multiply probabilities by 100 to get more aesthetic percentages
df_squares = df_squares * 100

fig, ax = plt.subplots(figsize=(14,7))
sns.heatmap(df_squares, cmap="coolwarm", annot=True, cbar=False, annot_kws={"fontsize":16})

plt.xlabel('Home Number', fontsize=12)
ax.xaxis.set_ticks_position('top')
ax.xaxis.set_label_position('top')

plt.ylabel('Away Number', fontsize=12)

# Fix broken issue
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

fig.suptitle('SuperBowl Squares Probabilities', fontsize=20)
``` Cheers!

This site uses Akismet to reduce spam. Learn how your comment data is processed.