Skip to main content

Introduction

Dependency Injection (DI) is a design pattern that enables loose coupling by injecting dependencies into objects rather than having them create their own. The IMDb Scraper uses a DependencyContainer to centralize dependency creation and lifecycle management.
The DependencyContainer acts as the Composition Root - the single place where all application dependencies are wired together.

The DependencyContainer Pattern

What is a Dependency Container?

A Dependency Container (also called Inversion of Control container) is responsible for:

Object Creation

Creates and configures objects with their dependencies

Lifecycle Management

Manages object lifecycles (singleton, transient, scoped)

Dependency Resolution

Resolves dependency graphs automatically

Resource Cleanup

Ensures resources are properly released

DependencyContainer Implementation

infrastructure/factory/dependency_container.py
from domain.interfaces.use_case_interface import UseCaseInterface
from domain.interfaces.scraper_interface import ScraperInterface
from domain.interfaces.proxy_interface import ProxyProviderInterface
from domain.interfaces.tor_interface import TorInterface

from application.use_cases.save_movie_with_actors_csv_use_case import SaveMovieWithActorsCsvUseCase
from application.use_cases.save_movie_with_actors_postgres_use_case import SaveMovieWithActorsPostgresUseCase
from application.use_cases.composite_save_movie_with_actors_use_case import CompositeSaveMovieWithActorsUseCase
from infrastructure.persistence.csv.repositories.movie_csv_repository import MovieCsvRepository
from infrastructure.persistence.csv.repositories.actor_csv_repository import ActorCsvRepository
from infrastructure.persistence.csv.repositories.movie_actor_csv_repository import MovieActorCsvRepository
from infrastructure.persistence.postgres.repositories.movie_postgres_repository import MoviePostgresRepository
from infrastructure.persistence.postgres.repositories.actor_postgres_repository import ActorPostgresRepository
from infrastructure.persistence.postgres.repositories.movie_actor_postgres_repository import MovieActorPostgresRepository
from infrastructure.scraper.imdb_scraper import ImdbScraper
from infrastructure.persistence.postgres.postgres_connection import connection_pool
from infrastructure.network.proxy_provider import ProxyProvider
from infrastructure.network.tor_rotator import TorRotator

class DependencyContainer:
    """
    Un contenedor centralizado para la inyección de dependencias.
    Gestiona la creación y el ciclo de vida de los servicios de la aplicación.
    """
    def __init__(self, config):
        self.config = config
        self._db_connection = None
        self.proxy_provider = ProxyProvider()
        self.tor_rotator = TorRotator()

    def get_db_connection(self):
        """Gestiona la conexión a la BD para que se cree una sola vez."""
        if self._db_connection is None and connection_pool:
            self._db_connection = connection_pool.getconn()
        return self._db_connection

    def close_db_connection(self):
        """Cierra la conexión y la devuelve al pool."""
        if self._db_connection and connection_pool:
            connection_pool.putconn(self._db_connection)
            self._db_connection = None
            print("Conexión a la base de datos cerrada y devuelta al pool.")

    def get_csv_use_case(self) -> UseCaseInterface:
        """Construye y devuelve el caso de uso para CSV."""
        return SaveMovieWithActorsCsvUseCase(
            movie_repository=MovieCsvRepository(),
            actor_repository=ActorCsvRepository(),
            movie_actor_repository=MovieActorCsvRepository()
        )

    def get_postgres_use_case(self) -> UseCaseInterface:
        """Construye y devuelve el caso de uso para PostgreSQL."""
        conn = self.get_db_connection()
        return SaveMovieWithActorsPostgresUseCase(
            movie_repository=MoviePostgresRepository(conn),
            actor_repository=ActorPostgresRepository(conn),
            movie_actor_repository=MovieActorPostgresRepository(conn)
        )

    def get_composite_use_case(self) -> UseCaseInterface:
        """Construye el caso de uso compuesto."""
        use_cases = [self.get_csv_use_case(), self.get_postgres_use_case()]
        return CompositeSaveMovieWithActorsUseCase(use_cases)
    
    def get_proxy_provider(self) -> ProxyProviderInterface:
        """Fábrica para el proveedor de proxy."""
        return ProxyProvider()

    def get_tor_rotator(self) -> TorInterface:
        """Fábrica para el rotador de TOR."""
        return TorRotator()
    
    def get_scraper(self) -> ScraperInterface:
        """
        Construye y devuelve el scraper principal inyectando TODAS sus dependencias.
        """
        use_case = self.get_composite_use_case()
        proxy_provider = self.get_proxy_provider()
        tor_rotator = self.get_tor_rotator()
        
        engine = self.config.SCRAPER_ENGINE.lower()

        if engine == "requests":
            return ImdbScraper(
                use_case=use_case,
                proxy_provider=proxy_provider,
                tor_rotator=tor_rotator,
                engine=engine
            )
        elif engine == "playwright":
            raise NotImplementedError("El motor 'playwright' aún no está implementado.")
        else:
            raise ValueError(f"Motor de scraping '{engine}' no soportado.")

How It Works

Dependency Graph

Lifecycle Management

The container manages different object lifecycles:
Database Connection - Created once and reused
def get_db_connection(self):
    """Singleton pattern - creates connection only once."""
    if self._db_connection is None and connection_pool:
        self._db_connection = connection_pool.getconn()
    return self._db_connection
The database connection is expensive to create, so it’s reused across all repositories.

Factory Pattern Usage

Factory Methods

Each get_*() method is a factory method that encapsulates object creation:
Creates the CSV persistence use case with all its repository dependencies.
def get_csv_use_case(self) -> UseCaseInterface:
    """Construye y devuelve el caso de uso para CSV."""
    return SaveMovieWithActorsCsvUseCase(
        movie_repository=MovieCsvRepository(),
        actor_repository=ActorCsvRepository(),
        movie_actor_repository=MovieActorCsvRepository()
    )
Dependencies created:
  • MovieCsvRepository
  • ActorCsvRepository
  • MovieActorCsvRepository
Creates the PostgreSQL persistence use case with database connection.
def get_postgres_use_case(self) -> UseCaseInterface:
    """Construye y devuelve el caso de uso para PostgreSQL."""
    conn = self.get_db_connection()  # Reuse singleton connection
    return SaveMovieWithActorsPostgresUseCase(
        movie_repository=MoviePostgresRepository(conn),
        actor_repository=ActorPostgresRepository(conn),
        movie_actor_repository=MovieActorPostgresRepository(conn)
    )
Dependencies created:
  • Database connection (singleton)
  • MoviePostgresRepository
  • ActorPostgresRepository
  • MovieActorPostgresRepository
Creates a composite use case that executes multiple persistence strategies.
def get_composite_use_case(self) -> UseCaseInterface:
    """Construye el caso de uso compuesto."""
    use_cases = [
        self.get_csv_use_case(),
        self.get_postgres_use_case()
    ]
    return CompositeSaveMovieWithActorsUseCase(use_cases)
Dependencies created:
  • CSV use case (and all its dependencies)
  • PostgreSQL use case (and all its dependencies)
  • Composite wrapper
Creates the scraper with all network and persistence dependencies.
def get_scraper(self) -> ScraperInterface:
    use_case = self.get_composite_use_case()
    proxy_provider = self.get_proxy_provider()
    tor_rotator = self.get_tor_rotator()
    
    engine = self.config.SCRAPER_ENGINE.lower()
    
    if engine == "requests":
        return ImdbScraper(
            use_case=use_case,
            proxy_provider=proxy_provider,
            tor_rotator=tor_rotator,
            engine=engine
        )
Dependencies created:
  • Composite use case (entire persistence layer)
  • ProxyProvider
  • TorRotator
  • ImdbScraper configured with all dependencies

Benefits of This Approach

1. Centralized Configuration

All dependency wiring happens in one place, making it easy to understand and modify:
# Instead of scattered instantiation:
# scraper.py: proxy = ProxyProvider()
# use_case.py: repo = MovieCsvRepository()
# ...

# Everything is in DependencyContainer:
container = DependencyContainer(config)
scraper = container.get_scraper()  # All dependencies resolved

2. Easy Testing

You can create test containers with mock dependencies:
class TestDependencyContainer(DependencyContainer):
    def get_postgres_use_case(self):
        """Override to return mock use case."""
        return MockPostgresUseCase()
    
    def get_proxy_provider(self):
        """Override to return mock proxy."""
        return MockProxyProvider()

# Test with mocks
container = TestDependencyContainer(test_config)
scraper = container.get_scraper()  # Uses mocks automatically

3. Flexible Configuration

Switch implementations via configuration:
# config.py
SCRAPER_ENGINE = "requests"  # or "playwright"

# DependencyContainer automatically selects implementation
def get_scraper(self):
    engine = self.config.SCRAPER_ENGINE.lower()
    
    if engine == "requests":
        return ImdbScraper(...)  # requests-based scraper
    elif engine == "playwright":
        return ImdbScraperPlaywright(...)  # Playwright-based scraper

4. Resource Management

Proper cleanup of resources:
container = DependencyContainer(config)
try:
    scraper = container.get_scraper()
    scraper.scrape()
finally:
    container.close_db_connection()  # Ensures connection is returned to pool

Usage in Application Entry Point

presentation/cli/run_scraper.py
from infrastructure.factory.dependency_container import DependencyContainer
from shared.config import config
import logging

logger = logging.getLogger(__name__)

def main():
    logger.info("Inicializando contenedor de dependencias...")
    container = DependencyContainer(config)
    
    try:
        logger.info("Construyendo scraper...")
        scraper = container.get_scraper()  # All dependencies wired automatically
        
        logger.info("Iniciando proceso de scraping...")
        scraper.scrape()
        logger.info("Proceso de scraping finalizado exitosamente.")

    except Exception as e:
        logger.critical(f"Error fatal: {e}", exc_info=True)
    finally:
        logger.info("Cerrando recursos...")
        container.close_db_connection()  # Cleanup

if __name__ == "__main__":
    main()
The entry point is minimal - it just creates the container and gets the scraper. All complexity is encapsulated in the container.

Dependency Injection Principles

Constructor Injection

Dependencies are passed via constructor (recommended approach):
# Use case receives repositories via constructor
class SaveMovieWithActorsPostgresUseCase(UseCaseInterface):
    def __init__(
        self,
        movie_repository: MovieRepository,
        actor_repository: ActorRepository,
        movie_actor_repository: MovieActorRepository
    ):
        self.movie_repository = movie_repository
        self.actor_repository = actor_repository
        self.movie_actor_repository = movie_actor_repository
Constructor injection makes dependencies explicit and testable. You can see exactly what a class needs.

Depend on Abstractions

Classes depend on interfaces, not concrete implementations:
# ✅ Good: Depends on interface
class ImdbScraper(ScraperInterface):
    def __init__(
        self,
        use_case: UseCaseInterface,  # Interface, not concrete class
        proxy_provider: ProxyProviderInterface,
        tor_rotator: TorInterface
    ):
        self.use_case = use_case
        self.proxy_provider = proxy_provider
        self.tor_rotator = tor_rotator

# ❌ Bad: Depends on concrete implementation
class ImdbScraper:
    def __init__(self):
        self.use_case = SaveMovieWithActorsPostgresUseCase(...)  # Hardcoded
        self.proxy_provider = ProxyProvider()  # Can't swap

Single Responsibility

The container’s only job is dependency creation and wiring:
# ✅ Container creates and wires
class DependencyContainer:
    def get_scraper(self):
        return ImdbScraper(
            use_case=self.get_composite_use_case(),
            proxy_provider=self.get_proxy_provider(),
            tor_rotator=self.get_tor_rotator()
        )

# ✅ Scraper focuses on scraping logic
class ImdbScraper:
    def scrape(self):
        # Business logic only, no dependency creation
        movies = self._fetch_movies()
        for movie in movies:
            self.use_case.execute(movie)

Advanced Patterns

Composite Use Case Pattern

The container creates a composite that executes multiple strategies:
def get_composite_use_case(self) -> UseCaseInterface:
    """Executes both CSV and PostgreSQL persistence."""
    use_cases = [
        self.get_csv_use_case(),
        self.get_postgres_use_case()
    ]
    return CompositeSaveMovieWithActorsUseCase(use_cases)
The composite implements the same interface as individual use cases:
application/use_cases/composite_save_movie_with_actors_use_case.py
class CompositeSaveMovieWithActorsUseCase(UseCaseInterface):
    def __init__(self, use_cases: List[UseCaseInterface]):
        self.use_cases = use_cases
        self.max_workers = len(use_cases)

    def execute(self, movie: Movie) -> None:
        """Executes all use cases in parallel."""
        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            list(executor.map(lambda uc: uc.execute(movie), self.use_cases))
The scraper doesn’t know it’s using a composite - it just calls use_case.execute(movie) and both CSV and PostgreSQL persistence happen automatically.

Strategy Pattern

The container selects scraper implementation based on configuration:
def get_scraper(self) -> ScraperInterface:
    engine = self.config.SCRAPER_ENGINE.lower()
    
    # Strategy selection based on config
    if engine == "requests":
        return ImdbScraper(...)
    elif engine == "playwright":
        return ImdbScraperPlaywright(...)
    elif engine == "selenium":
        return ImdbScraperSelenium(...)
    else:
        raise ValueError(f"Unsupported engine: {engine}")

Best Practices

The container should only create and wire objects, not contain business logic.
# ✅ Good: Pure factory
def get_scraper(self):
    return ImdbScraper(
        use_case=self.get_composite_use_case(),
        proxy_provider=self.get_proxy_provider()
    )

# ❌ Bad: Business logic in container
def get_scraper(self):
    scraper = ImdbScraper(...)
    scraper.scrape()  # Don't execute here!
    return scraper
Factory methods should return interfaces, not concrete types.
# ✅ Good: Returns interface
def get_scraper(self) -> ScraperInterface:
    return ImdbScraper(...)

# ❌ Bad: Returns concrete type
def get_scraper(self) -> ImdbScraper:
    return ImdbScraper(...)
Be explicit about object lifecycles and resource cleanup.
# ✅ Good: Explicit lifecycle management
def __init__(self, config):
    self._db_connection = None  # Lazy initialization

def close_db_connection(self):
    if self._db_connection:
        connection_pool.putconn(self._db_connection)
        self._db_connection = None
Don’t pass the container itself as a dependency (Service Locator anti-pattern).
# ✅ Good: Inject dependencies
scraper = ImdbScraper(
    use_case=container.get_composite_use_case()
)

# ❌ Bad: Pass container (Service Locator)
scraper = ImdbScraper(container=container)
# Inside scraper:
# self.use_case = container.get_composite_use_case()  # Hidden dependency

Real-World Benefits

The DependencyContainer has enabled:
  1. Hybrid Persistence: Seamlessly save to both CSV and PostgreSQL simultaneously
  2. Network Resilience: Easily wire together proxy, TOR, and VPN layers
  3. Testability: Swap real dependencies with mocks for testing
  4. Configurability: Change scraper engine via environment variable
  5. Maintainability: Single place to update dependency wiring
The container transforms complex dependency graphs into simple, manageable factory methods.

Further Reading

Clean Architecture

Understand the overall architectural principles

Domain Models

Learn about the entities being created

Use Cases

Explore the application layer

Environment Variables

Configure the dependency container

Build docs developers (and LLMs) love