⚽ Calculating Premier League Win Probabilities Using Python and the Football-Data.org API

As a football enthusiast and data science learner, I decided to analyze last season’s Premier League teams by calculating the probability of winning a specific number of games using the Bernoulli distribution. This article walks through how I used the Football-Data.org API and Python to extract match data and model win probabilities.

πŸ“¦ Tools & Tech Stack

  1. Python 🐍
  2. Requests (HTTP Library)
  3. Football-Data.org API
  4. Bernoulli Distribution Formula:

𝑃(π‘˜ wins)=(𝑛/π‘˜)π‘π‘˜ (1βˆ’π‘)π‘›βˆ’π‘˜

where:

k = number of games won

n = total number of games played (usually 38)

p = estimated probability of winning a game
πŸ”‘ Step 1: Getting the API Key
To use the API:

  1. Sign up at https://www.football-data.org/
  2. Get your API key from the dashboard
  3. Save it in a .env file like this:
API_KEY=your_api_key_here

πŸ” Make sure to add .env to your .gitignore so it’s never pushed to GitHub.

πŸ“‘ Step 2: Fetch Premier League Standings via API
We used the /competitions/PL/standings endpoint for the 2024/2025 season:

import requests
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("API_KEY")

url = "https://api.football-data.org/v4/competitions/PL/standings"
headers = {"X-Auth-Token": api_key}

response = requests.get(url, headers=headers)
data = response.json()

πŸ“ Step 3: Calculate Win Probability
We used the Bernoulli distribution to calculate the probability of each team winning k games out of n = 38:

import math

def calculate_win_probability(team_name, wins, total_games=38):
    p = wins / total_games
    probability = math.comb(total_games, wins) * (p ** wins) * ((1 - p) ** (total_games - wins))
    return team_name, round(probability, 6)

πŸ“ˆ Results
This gave us a probabilistic view of how likely it is that a team would win exactly the number of games they did β€” based on a binomial model.
| Team | Wins | Win Probability |
| ————— | —- | ————— |
| Manchester City | 28 | 0.048129 |
| Arsenal | 26 | 0.060201 |
πŸ€” Limitations
The Bernoulli/binomial model assumes each match is independent and has equal probability, which isn’t realistic in football.
It does not account for home/away advantage, injuries, transfers, or form.
Still, it’s a fun and mathematically sound way to get started with sports analytics!

βœ… Conclusion
This project was a great exercise in:

  1. Consuming real-world APIs
  2. Using statistical methods like the binomial distribution
  3. Thinking probabilistically about sports performance

Similar Posts