Domain models are the heart of the IMDb Scraper’s business logic. They represent real-world concepts (movies, actors) and encapsulate validation rules to ensure data integrity.
Domain models are pure Python classes with zero dependencies on frameworks, databases, or external services. They only contain business logic.
The Movie entity represents a film with comprehensive validation:
domain/models/movie.py
from dataclasses import dataclass, fieldfrom typing import Optional, Listimport refrom domain.models.actor import Actor@dataclassclass Movie: """ Modelo de dominio que representa una película y valida su propia integridad. """ id: Optional[int] imdb_id: str title: str year: int rating: float duration_minutes: Optional[int] metascore: Optional[int] actors: List[Actor] = field(default_factory=list) def __post_init__(self): """ Realiza validaciones en los datos después de que el objeto es creado. """ # Limpieza de datos self.title = self.title.strip() self.imdb_id = self.imdb_id.strip() # Reglas de validación if not re.match(r"^tt\d{7,}$", self.imdb_id): raise ValueError(f"IMDb ID inválido: '{self.imdb_id}'") if not self.title: raise ValueError("El título no puede estar vacío.") if not (1888 <= self.year <= 2030): raise ValueError(f"Año inválido: {self.year}. Debe estar entre 1888 y 2030.") if not (0.0 <= self.rating <= 10.0): raise ValueError(f"Rating inválido: {self.rating}. Debe estar entre 0.0 y 10.0.") if self.duration_minutes is not None and self.duration_minutes <= 0: raise ValueError(f"La duración debe ser un número positivo.") if self.metascore is not None and not (0 <= self.metascore <= 100): raise ValueError(f"Metascore inválido: {self.metascore}. Debe estar entre 0 y 100.")
if not re.match(r"^tt\d{7,}$", self.imdb_id): raise ValueError(f"IMDb ID inválido: '{self.imdb_id}'")
Must start with tt
Followed by at least 7 digits
Examples: tt0111161, tt0068646, tt0468569
Title Validation
self.title = self.title.strip()if not self.title: raise ValueError("El título no puede estar vacío.")
Automatically trims whitespace
Cannot be empty after trimming
Year Validation
if not (1888 <= self.year <= 2030): raise ValueError(f"Año inválido: {self.year}")
1888: First motion picture ever made
2030: Reasonable upper bound for unreleased films
Rating Validation
if not (0.0 <= self.rating <= 10.0): raise ValueError(f"Rating inválido: {self.rating}")
IMDb uses 0.0 to 10.0 scale
Enforces this constraint in the domain
Optional Field Validation
if self.duration_minutes is not None and self.duration_minutes <= 0: raise ValueError(f"La duración debe ser un número positivo.")if self.metascore is not None and not (0 <= self.metascore <= 100): raise ValueError(f"Metascore inválido: {self.metascore}")
The Actor entity represents an actor with minimal but essential validation:
domain/models/actor.py
from dataclasses import dataclassfrom typing import Optional@dataclassclass Actor: """ Modelo de dominio que representa un actor y valida su propia integridad. """ id: Optional[int] name: str def __post_init__(self): """Valida los datos del actor después de la inicialización.""" self.name = self.name.strip() if not self.name: raise ValueError("El nombre del actor no puede estar vacío.")
The MovieActor entity represents the many-to-many relationship between movies and actors:
domain/models/movie_actor.py
from dataclasses import dataclass@dataclassclass MovieActor: """ Modelo que representa la relación N:M y valida su integridad. """ movie_id: int actor_id: int def __post_init__(self): """Valida los datos de la relación después de la inicialización.""" if not isinstance(self.movie_id, int) or self.movie_id <= 0: raise ValueError("movie_id debe ser un entero positivo.") if not isinstance(self.actor_id, int) or self.actor_id <= 0: raise ValueError("actor_id debe ser un entero positivo.")
# Invalid data is rejected before persistencetry: movie = Movie( id=None, imdb_id="tt123", # Too short title="Test", year=1800, # Too old rating=11.0, # Out of range duration_minutes=-10, # Negative metascore=150 # Out of range )except ValueError as e: logger.error(f"Invalid movie data: {e}") # Data never reaches database
Perform validation in __post_init__() to fail fast at object creation time.
@dataclassclass Movie: # ... def __post_init__(self): # Validation happens immediately if not self.title: raise ValueError("Title required")
2. Keep Entities Pure
Domain entities should have zero dependencies on frameworks or infrastructure.
# ✅ Good: Only standard library importsfrom dataclasses import dataclassimport re# ❌ Bad: Infrastructure dependenciesfrom sqlalchemy import Column, Integerfrom requests import get
3. Make Validation Explicit
Use clear error messages that explain business rules.
# ✅ Good: Descriptive error messageif not (0.0 <= self.rating <= 10.0): raise ValueError( f"Rating must be 0.0-10.0, got {self.rating}" )# ❌ Bad: Vague errorif not (0.0 <= self.rating <= 10.0): raise ValueError("Invalid rating")
4. Use Type Hints
Combine type hints with validation for maximum safety.
@dataclassclass Movie: id: Optional[int] # Type checker knows this can be None title: str # Type checker enforces string year: int # Type checker enforces integer