β½ Calculating Premier League Win Probabilities Using Python and the Football-Data.org API
As a football enthusiast and data science learner, I decided to analyze last seasonβs Premier League teams by calculating the probability of winning a specific number of games using the Bernoulli distribution. This article walks through how I used the Football-Data.org API and Python to extract match data and model win probabilities.
π¦ Tools & Tech Stack
- Python π
- Requests (HTTP Library)
- Football-Data.org API
- Bernoulli Distribution Formula:
π(π wins)=(π/π)ππ (1βπ)πβπ
where:
k = number of games won
n = total number of games played (usually 38)
p = estimated probability of winning a game
π Step 1: Getting the API Key
To use the API:
- Sign up at https://www.football-data.org/
- Get your API key from the dashboard
- Save it in a .env file like this:
API_KEY=your_api_key_here
π Make sure to add .env to your .gitignore so it’s never pushed to GitHub.
π‘ Step 2: Fetch Premier League Standings via API
We used the /competitions/PL/standings endpoint for the 2024/2025 season:
import requests
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("API_KEY")
url = "https://api.football-data.org/v4/competitions/PL/standings"
headers = {"X-Auth-Token": api_key}
response = requests.get(url, headers=headers)
data = response.json()
π Step 3: Calculate Win Probability
We used the Bernoulli distribution to calculate the probability of each team winning k games out of n = 38:
import math
def calculate_win_probability(team_name, wins, total_games=38):
p = wins / total_games
probability = math.comb(total_games, wins) * (p ** wins) * ((1 - p) ** (total_games - wins))
return team_name, round(probability, 6)
π Results
This gave us a probabilistic view of how likely it is that a team would win exactly the number of games they did β based on a binomial model.
| Team | Wins | Win Probability |
| ————— | —- | ————— |
| Manchester City | 28 | 0.048129 |
| Arsenal | 26 | 0.060201 |
π€ Limitations
The Bernoulli/binomial model assumes each match is independent and has equal probability, which isnβt realistic in football.
It does not account for home/away advantage, injuries, transfers, or form.
Still, itβs a fun and mathematically sound way to get started with sports analytics!
β
Conclusion
This project was a great exercise in:
- Consuming real-world APIs
- Using statistical methods like the binomial distribution
- Thinking probabilistically about sports performance