Maple XI
Football betting insights platform with statistical analysis engine comparing model probabilities against bookmaker odds.
Maple XI is a data-driven football analysis platform that identifies value bets by comparing its own probability engine against bookmaker odds. The system ingests historical match data, calculates expected outcomes using statistical models, and highlights markets where the bookmaker's implied probability diverges significantly from the model's assessment. Built with Python for the analysis engine and Streamlit for rapid interactive visualisation. This project demonstrates full-stack data engineering expertise, combining machine learning model development, statistical analysis, data pipeline architecture, and UI design into a cohesive platform. For sports analytics and betting businesses, the ability to execute on this entire stack — from data ingestion through model training to interactive visualization — is what separates viable products from proof-of-concept projects that never reach market.
Python + Streamlit
Statistical Football Analysis Engine.
A Python-powered statistical platform that models football match probabilities and compares them against bookmaker odds to surface value opportunities, presented through an interactive Streamlit dashboard. Streamlit was chosen for its rapid development velocity and ability to turn Python scripts into interactive web apps without requiring frontend engineering expertise. The analysis engine processes millions of data points — historical match results, team statistics, player performance data, injury reports, weather conditions, and historical odds. This data is analyzed through statistical models including logistic regression, random forests, and ensemble methods that predict match outcomes. The models are trained on historical data, validated on holdout test sets, and continuously refit as new matches complete. This machine learning approach is far more accurate than simpler heuristic scoring systems.

Project Overview
Finding edge through statistical modelling.
The Maple XI engine processes historical match data including goals scored, expected goals (xG), possession stats, and form metrics to build probability distributions for upcoming matches. These probabilities are then compared against odds from major bookmakers — when the model assigns a significantly higher probability to an outcome than the bookmaker implies, it flags as a potential value bet. The Streamlit dashboard allows filtering by league, date range, and confidence threshold, presenting results as sortable tables with visual indicators showing the magnitude of disagreement between model and market. The system also tracks historical accuracy, showing how the model's predictions have performed over time against actual results. The core insight driving Maple XI is that betting markets aren't perfectly efficient. Bookmakers price bets based on aggregate market opinion and their own profit margins, not pure statistical probability. When the statistical model disagrees significantly with bookmaker pricing, that's a signal. If the model assigns 60% probability to a team winning but the bookmaker's odds imply only 40%, that's a 20 percentage-point gap — a value opportunity. A bet at 60% accuracy in a market pricing at 40% has positive expected value. Over time, repeatedly taking bets with positive expected value generates profit, even when individual bets lose. This is the fundamental principle of value betting. The data infrastructure supporting this is substantial. Match data comes from multiple sources — official league sites, specialized football statistics platforms like Opta and StatsBomb, and bookmaker APIs. This data is ingested into a data warehouse, cleaned and normalized, and made available to the modeling pipeline. The models themselves are trained on years of historical data — enough matches that the training set contains diverse conditions: different leagues, seasons, weather conditions, team strengths, home/away effects, and rest periods. The validation methodology is rigorous. Historical data is split chronologically — models are trained on data through a specific date, validated on the subsequent month, and tested on unseen future data. This temporal validation prevents the common data science pitfall of overfitting, where a model learns historical patterns that don't predict the future. Only models that generalize to future data are deployed. Once deployed, the model's predictions are tracked against actual results, allowing continuous accuracy measurement and retraining as new data accumulates.

