Skip to main content

Overview

The Meta-Data Tag Generator uses Docker Compose to orchestrate multiple services including the FastAPI backend, Next.js frontend, PostgreSQL database, MinIO object storage, and Redis for job management.
Estimated installation time: 10-15 minutes (including Docker image downloads)

Prerequisites

System Requirements

  • OS: Linux, macOS, or Windows with WSL2
  • RAM: 4GB minimum, 8GB recommended
  • Storage: 5GB minimum, 20GB recommended
  • CPU: 2+ cores

Required Software

  • Docker 20.10+
  • Docker Compose 2.0+
  • Git
  • Port availability: 3001, 8000, 5432, 9000, 9001, 6379

Install Docker & Docker Compose

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Add your user to docker group
sudo usermod -aG docker $USER
newgrp docker

# Verify installation
docker --version
docker-compose --version

Installation Steps

1

Clone the Repository

git clone https://github.com/your-org/Meta-Data-Tag-Generator.git
cd Meta-Data-Tag-Generator/source
2

Configure Environment Variables (Optional)

Create a .env file in the backend directory to customize settings:
.env
# Database Settings
DATABASE_URL=postgresql://metatag:metatag_secret@postgres:5432/metatag_db
DB_HOST=postgres
DB_PORT=5432
DB_USER=metatag
DB_PASSWORD=metatag_secret
DB_NAME=metatag_db

# MinIO Object Storage
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin123
MINIO_BUCKET=metatag-files

# Redis
REDIS_URL=redis://redis:6379/0

# JWT Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key-change-in-production
JWT_ALGORITHM=HS256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30
JWT_REFRESH_TOKEN_EXPIRE_DAYS=7

# OpenRouter API (optional defaults)
DEFAULT_MODEL=openai/gpt-4o-mini
API_CONNECT_TIMEOUT=30
API_READ_TIMEOUT=90
API_MAX_RETRIES=3

# Processing Limits
MAX_PDF_SIZE_MB=50
MAX_PAGES_TO_EXTRACT=10
MAX_TAGS=15
MIN_TAGS=3
Production deployment: Change JWT_SECRET_KEY, DB_PASSWORD, and MinIO credentials before deploying to production!
3

Review Docker Compose Configuration

The docker-compose.yml defines 6 services:
docker-compose.yml
services:
  # PostgreSQL - User data and job history
  postgres:
    image: postgres:15-alpine
    ports: ["5432:5432"]
    environment:
      POSTGRES_USER: metatag
      POSTGRES_PASSWORD: metatag_secret
      POSTGRES_DB: metatag_db
  
  # MinIO - Object storage for PDFs
  minio:
    image: minio/minio:latest
    ports: ["9000:9000", "9001:9001"]
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin123
  
  # Redis - Job state and pub/sub
  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]
  
  # Backend API - FastAPI application
  backend:
    build: ./backend
    ports: ["8000:8000"]
    depends_on:
      - postgres
      - minio
      - redis
  
  # Frontend - Next.js application
  frontend:
    build: ./frontend
    ports: ["3001:3000"]
    depends_on:
      - backend
All services include health checks to ensure proper startup order. The backend waits for the database to be ready before starting.
4

Build and Start Services

# Build all Docker images (first time only)
docker-compose build

# Start all services in detached mode
docker-compose up -d
First-time build takes 5-10 minutes as it downloads base images and installs dependencies including Tesseract OCR, EasyOCR models, and PyTorch.
Expected output:
[+] Running 6/6
 ✔ Container meta-tag-postgres   Healthy
 ✔ Container meta-tag-minio      Healthy
 ✔ Container meta-tag-redis      Healthy
 ✔ Container meta-tag-backend    Healthy
 ✔ Container meta-tag-frontend   Started
5

Verify Installation

Check that all services are running:
docker-compose ps
All services should show Up or Up (healthy) status.Test individual services:
curl http://localhost:8000/api/health
Expected response:
{
  "status": "healthy",
  "version": "2.0.0",
  "message": "Document Meta-Tagging API is running"
}
6

Initialize Database Schema

The database schema is automatically created on first startup via the init script:
# Verify tables were created
docker-compose exec postgres psql -U metatag -d metatag_db -c "\dt"
Expected tables:
  • users
  • refresh_tokens
  • processing_jobs
  • documents
The schema creation SQL is located at backend/app/database/schema.sql and is automatically executed via Docker volume mount.
7

Create First User Account

Register a user account via the API:
curl -X POST http://localhost:8000/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "[email protected]",
    "password": "SecurePassword123!",
    "full_name": "Admin User"
  }'
Expected response:
{
  "access_token": "eyJhbGc...",
  "refresh_token": "eyJhbGc...",
  "token_type": "bearer",
  "user": {
    "id": "uuid-here",
    "email": "[email protected]",
    "full_name": "Admin User"
  }
}
Save the access_token for API requests.

Service Architecture

Port Mapping

ServiceInternal PortExternal PortPurpose
Frontend30003001Web UI
Backend80008000REST API & WebSocket
PostgreSQL54325432Database
MinIO API90009000S3-compatible storage
MinIO Console90019001Admin interface
Redis63796379Job queue
For production deployments, do not expose PostgreSQL, MinIO, and Redis ports publicly. Use a reverse proxy (nginx) and firewall rules.

Volume Management

Docker Compose creates persistent volumes for data:
# List volumes
docker volume ls | grep meta-tag

# Inspect volume
docker volume inspect source_postgres_data

# Backup database
docker-compose exec postgres pg_dump -U metatag metatag_db > backup.sql

# Restore database
cat backup.sql | docker-compose exec -T postgres psql -U metatag -d metatag_db

Volume Locations

VolumePurposeTypical Size
postgres_dataUser data, jobs, documents100MB - 10GB
minio_dataUploaded PDFs, results1GB - 100GB
redis_dataJob state, cache10MB - 1GB
easyocr_modelsPre-trained OCR models400MB - 2GB
EasyOCR models are downloaded on first use and cached in the easyocr_models volume. This prevents re-downloading on container restart.

Resource Limits

The backend service has resource limits defined:
deploy:
  resources:
    limits:
      memory: 4G
    reservations:
      memory: 1G
Why 4GB limit? EasyOCR requires significant memory when loading models for complex scripts. For production with heavy OCR workloads, increase to 8GB.

Adjusting Resources

Edit docker-compose.yml:
backend:
  deploy:
    resources:
      limits:
        memory: 8G  # Increase for heavy OCR
        cpus: '4'   # Limit CPU usage
      reservations:
        memory: 2G  # Minimum guaranteed
Apply changes:
docker-compose up -d --force-recreate backend

Troubleshooting

Check logs:
docker-compose logs -f backend
docker-compose logs -f postgres
Common issues:
  • Ports already in use: Change port mappings in docker-compose.yml
  • Insufficient memory: Increase Docker Desktop memory allocation
  • Database not ready: Wait for health checks to pass
Symptoms: Backend logs show connection refused or database not readySolutions:
# Check PostgreSQL health
docker-compose ps postgres

# View PostgreSQL logs
docker-compose logs postgres

# Restart PostgreSQL
docker-compose restart postgres

# Verify connection
docker-compose exec postgres pg_isready -U metatag
Create bucket manually:
  1. Open http://localhost:9001
  2. Login with minioadmin / minioadmin123
  3. Create bucket named metatag-files
  4. Set policy to public (or configure access policies)
Or via CLI:
docker-compose exec minio mc alias set local http://localhost:9000 minioadmin minioadmin123
docker-compose exec minio mc mb local/metatag-files
Symptoms: First OCR request takes very long or failsSolution: Pre-download models:
docker-compose exec backend python -c "import easyocr; reader = easyocr.Reader(['en', 'hi'])"
This downloads models to the easyocr_models volume for future use.
Check environment variable:Frontend expects NEXT_PUBLIC_BACKEND_URL=http://backend:8000For local development:
# In frontend/.env.local
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000
Rebuild frontend:
docker-compose up -d --build frontend
Linux users: Docker volume permissions issue
# Fix ownership
sudo chown -R $USER:$USER .

# Or run with sudo
sudo docker-compose up -d

Development Mode

For active development, enable live code reloading:
1

Edit docker-compose.yml

Uncomment the volume mounts:
backend:
  volumes:
    # Mount code for development (comment out for production)
    - ./backend/app:/app/app
2

Restart services

docker-compose restart backend
Now code changes are immediately reflected without rebuilding.
3

View logs

docker-compose logs -f backend frontend

Stopping Services

docker-compose stop
Stops containers but preserves data in volumes.

Updating the Application

1

Pull latest changes

git pull origin main
2

Rebuild images

docker-compose build --no-cache
3

Restart services

docker-compose down
docker-compose up -d
4

Run database migrations

If schema changed:
docker-compose exec backend alembic upgrade head

Production Deployment

For production deployment, see:

Architecture Overview

Understand the system architecture and components

Environment Variables

Configure your deployment with environment variables

Processing Workflow

Learn the document processing workflow

AI Models

Choose and configure AI models

Next Steps

Installation complete! Your Meta-Data Tag Generator is ready to use.
Now you can:
  1. Process your first document - Follow the quick start guide
  2. Configure API settings - Customize processing parameters
  3. Explore the API - Integrate with your applications
Join our community for support and updates:
  • GitHub Issues: Report bugs and request features
  • Discussions: Ask questions and share use cases

Build docs developers (and LLMs) love