The StatsUpdateFM pipeline collects soccer statistics from FotMob, aggregates player/team performance data, and generates player projections for betting markets.
This pipeline replaced the legacy SportMonks-based system with a FotMob-first approach using HTML scraping as fallback when API is blocked.
Purpose: Fetch upcoming fixtures and store in fixtures databaseCollections Updated:
fixtures_fm
fixture_url_mapping_fm (for HTML scraping fallback)
Date Range: Next 14 daysImplementation:
for date in date_range(today, today + 14): matches = fotmob_client.get_matches_by_date( date=date.strftime('%Y%m%d') ) for league in matches.get('leagues', []): for match in league.get('matches', []): fixture_doc = { 'match_id': match['id'], 'home_team': match['home']['name'], 'away_team': match['away']['name'], 'league_name': league['name'], 'kick_off_time': match['status']['utcTime'], 'page_url': match.get('pageUrl') # For HTML fallback } fixtures_fm.update_one( {'match_id': match['id']}, {'$set': fixture_doc}, upsert=True )
Purpose: Scrape detailed player/team stats from completed fixturesCollections Updated:
fixture_api_cache_fm (raw FotMob data)
player_stats_fm (aggregated player stats)
team_stats_fm (aggregated team stats)
Stats Scraped:
Player Stats
Team Stats
Goals, Assists
Shots, Shots on Target
Passes, Pass Accuracy
Tackles, Interceptions
Dribbles, Fouls
Cards (Yellow/Red)
Expected Goals (xG)
Shot locations (Inside/Outside Box, Headers)
Possession %
Total Shots, Shots on Target
Corners
Fouls, Cards
Offsides
Pass Accuracy
Expected Goals (xG)
Scraping Logic:
for fixture in recent_fixtures: # Try API first try: data = fotmob_client.get_match_details(fixture['match_id']) except CloudflareBlockError: # Fallback to HTML scraping page_url = fixture_url_mapping_fm.find_one( {'fixture_id': str(fixture['match_id'])} ).get('page_url') data = fotmob_client.scrape_match_html(page_url) # Extract player stats lineup = data.get('content', {}).get('lineup', {}) for side in ['homeTeam', 'awayTeam']: for player in lineup.get(side, {}).get('starters', []): stats = player.get('stats', []) # Process and store...
def predict_starting_lineup(team_id, fixture_id): """ Predict starting XI based on recent lineups and player availability """ # Get last 5 matches recent_games = fixtures_fm.find({ '$or': [{'home_team_id': team_id}, {'away_team_id': team_id}], 'status': 'Finished' }).sort('kick_off_time', -1).limit(5) # Count player appearances in starting XI starter_counts = defaultdict(int) for game in recent_games: lineup = fixture_api_cache_fm.find_one({'match_id': game['match_id']}) side = 'homeTeam' if lineup['home_team_id'] == team_id else 'awayTeam' for player in lineup['content']['lineup'][side]['starters']: starter_counts[player['id']] += 1 # Select top 11 players by appearance frequency predicted_starters = sorted( starter_counts.items(), key=lambda x: x[1], reverse=True )[:11] # Store prediction Predicted_Lines_FM.update_one( {'fixture_id': fixture_id, 'team_id': team_id}, {'$set': { 'predicted_starters': [p[0] for p in predicted_starters], 'confidence': calculate_confidence(starter_counts), 'generated_at': datetime.now() }}, upsert=True )
from PROPPR.StatsUpdateFM.services.fotmob.client import MobFotclient = MobFot()# Fetch matches by datematches = client.get_matches_by_date( date='20260315', # YYYYMMDD format time_zone='America/New_York')# Get match detailsmatch_data = client.get_match_details(match_id=4832195)# Get league standingsstandings = client.get_league( id=47, # Premier League tab='table', type='league')# Search for teams/playersresults = client.search(term='Manchester City')
FotMob API blocks requests without valid Cloudflare Turnstile cookies. These must be refreshed periodically.
Cookie Loading
# Load cookies from PROPPR configfrom PROPPR.config import get_turnstile_cookies_pathcookie_path = get_turnstile_cookies_path()with open(cookie_path, 'r') as f: data = json.load(f) cookies = data.get('cookies', {})SESSION.cookies.update(cookies)
Cookie Refresh: Cookies expire after ~24 hours. Use Turnstile solver to regenerate.
Symptoms:CloudflareBlockError during API callsSolutions:
Refresh Turnstile cookies
Enable HTML scraping fallback
Use VPN/proxy rotation
Add delays between requests
Missing player stats
Issue: Player projections show 0.0 for all statsFix: Check player_stats_fm has recent data:
player = player_stats_fm.find_one({'player_id': 823154})print(player.get('avg')) # Should have statsprint(player.get('last_aggregated')) # Should be recent
If empty, re-run stats_collector.py and aggregate_stats.py
Slow pipeline execution
Cause: FotMob API rate limitingSolutions:
Increase delays between requests
Use multiprocessing for parallel fixture processing
# MongoDBMONGO_CONNECTION_STRING="mongodb://localhost:27017"MONGO_DATABASE="proppr"# FotMobTURNSTILE_COOKIES_PATH="/opt/PROPPR/config/turnstile_cookies.json"FOTMOB_ENABLE_DATE_PREFETCH=0 # Disable for HTML-first mode# PipelineSTATS_LOOKBACK_DAYS=90 # How far back to aggregate statsPROJECTION_CONFIDENCE_THRESHOLD=0.6 # Min confidence for alerts