Overview
OddsEngine’s data model is designed to centralize tennis-specific information from multiple sources and structure it for probabilistic analysis. The platform organizes data around three primary entities: Players, Matches, and Tournaments, with additional structures for Bet Combinations.The data architecture is built to support both real-time analysis and historical data persistence for statistical modeling.
Core Entities
Player Model
The Player entity captures comprehensive information about tennis players from both ATP and WTA tours. Key Attributes:- Identity: Player ID, name, nationality
- Ranking Data: Current ATP/WTA ranking, ranking points, career-high ranking
- Performance Metrics: Win/loss records, surface-specific statistics, head-to-head records
- Historical Data: Tournament results, match history, form indicators
Player data is synchronized with the API-Tennis service, which specializes in tennis-specific data and provides coverage of both ATP and WTA tours.
Match Model
The Match entity represents individual tennis matches with detailed metadata for analysis. Key Attributes:- Match Identity: Unique match ID, tournament reference, round
- Participants: Player references (player_1, player_2)
- Match Details: Date, time, court surface, match type (best of 3/5)
- Contextual Data: Tournament category, round importance, weather conditions
- Results: Score, match statistics, duration
Tournament Model
The Tournament entity organizes matches within competitive structures and provides contextual metadata. Key Attributes:- Tournament Identity: Tournament ID, name, year
- Classification: Category (Grand Slam, Masters 1000, ATP 500, etc.)
- Location Data: Country, city, venue
- Tournament Details: Surface, indoor/outdoor, prize money, ranking points
- Schedule: Start date, end date, match schedule
| Category | ATP Points (Winner) | WTA Points (Winner) |
|---|---|---|
| Grand Slam | 2000 | 2000 |
| Masters 1000 / WTA 1000 | 1000 | 1000 |
| ATP 500 / WTA 500 | 500 | 500 |
| ATP 250 / WTA 250 | 250 | 250 |
Bet Combination Model
Bet combinations represent user-defined selections for probabilistic analysis. Key Attributes:- Combination ID: Unique identifier
- Individual Bets: Array of bet selections
- Bet Details: Match reference, prediction type, selected outcome
- Calculated Probability: Combined probability percentage
- Timestamp: Creation and calculation timestamps
The platform calculates combined probabilities by applying probability multiplication rules to independent events, adjusted for correlation factors when applicable.
Data Flow Architecture
Data Pipeline
- Data Ingestion: External tennis data is consumed from API-Tennis using asynchronous HTTPX requests
- Data Validation: Incoming data is validated and mapped to internal models
- Data Persistence: Validated data is stored in Oracle Database for historical analysis
- Data Processing: Statistical models process stored data to extract patterns
- Data Presentation: Processed data is served to the frontend for user interaction
Data Storage Strategy
OddsEngine uses Oracle Database for persistent storage, enabling complex queries and historical analysis required for probability calculations.
- Historical Data: Match results and player statistics are retained for trend analysis
- Performance Optimization: Frequently accessed data (current rankings, recent matches) is indexed for fast retrieval
- Data Freshness: External data is synchronized on a configurable schedule to balance API limits with data currency
Fallback Strategy
To handle API failures or rate limiting:- Mock Data Provider: A
mock_data_provider.pymodule simulates network responses with local data - Graceful Degradation: The system continues functioning with cached data when external APIs are unavailable
- Rate Limit Management: Request throttling ensures the 1,000 requests/month limit is respected
Schema Relationships
Data Quality and Validation
Validation Rules:- Player rankings must be positive integers
- Match dates cannot be in the past for scheduled matches
- Probability values must be between 0 and 1
- Surface types are restricted to valid values (clay, hard, grass, carpet)
- Tournament categories follow official ATP/WTA classifications
Data validation occurs at multiple layers: API response validation, model-level validation, and database constraints.
Next Steps
To understand how this data is used for analysis:- See Probability Engine for calculation methods
- See Tennis Data for tennis-specific data details