Skip to main content
The midPilot Connector Generator uses environment variables for configuration. All settings are defined in src/config.py and can be overridden using a .env file or environment variables.

Configuration File Setup

1

Copy the example configuration

cp .env-example .env
2

Edit the configuration

Open .env in your editor and configure the required variables:
nano .env
# or
vim .env
3

For testing environments

Create a separate test configuration:
cp .env.test-example .env.test

Environment Variable Naming

Configuration uses nested environment variables with double underscore (__) delimiter:
# Pattern: SECTION__PARAMETER=value
APP__TITLE="My Custom Title"
LOGGING__LEVEL=debug
DATABASE__HOST=localhost
DATABASE__PORT=5432

Core Configuration Sections

Application Settings

Configure the FastAPI application and Uvicorn server:
# Application metadata
APP__TITLE="Smart Integration Microservice (DEV)"
APP__VERSION="0.1.0"
APP__DESCRIPTION="Smart Integration Microservice for scraping, digester and CodeGen"

# Server configuration
APP__HOST=0.0.0.0
APP__PORT=8090
APP__LIVE_RELOAD=true
APP__WORKERS=1

# Proxy and forwarding
APP__PROXY_HEADERS=true
APP__FORWARDED_ALLOW_IPS=*

# Timeouts
APP__TIMEOUT_KEEP_ALIVE=10
APP__TIMEOUT_GRACEFUL_SHUTDOWN=15
APP__TITLE
string
default:"Smart Integration Microservice"
API title displayed in documentation
APP__PORT
integer
default:"8090"
Port number for the API server
APP__LIVE_RELOAD
boolean
default:"false"
Enable hot reload during development
APP__WORKERS
integer
default:"1"
Number of Uvicorn worker processes

Logging Configuration

Control logging behavior and verbosity:
LOGGING__LEVEL=info
LOGGING__ACCESS_LOG=true
LOGGING__COLORS=true
LOGGING__LEVEL
enum
default:"info"
Log level: debug, info, warning, error, or critical
LOGGING__ACCESS_LOG
boolean
default:"true"
Enable HTTP access logs
LOGGING__COLORS
boolean
default:"false"
Enable colored log output (useful for development)

Database Configuration

Configure PostgreSQL database connection:
# Individual connection parameters
DATABASE__HOST=localhost
DATABASE__PORT=5432
DATABASE__NAME=db
DATABASE__USER=user
DATABASE__PASSWORD=password

# Full connection URL (auto-constructed if not provided)
DATABASE__URL=postgresql+asyncpg://${DATABASE__USER}:${DATABASE__PASSWORD}@${DATABASE__HOST}:${DATABASE__PORT}/${DATABASE__NAME}

# Connection pool settings
DATABASE__POOL_SIZE=10
DATABASE__MAX_OVERFLOW=20
DATABASE__ECHO=false
DATABASE__HOST
string
default:""
Database server hostname or IP address
DATABASE__PORT
integer
default:"5432"
PostgreSQL port number
DATABASE__NAME
string
default:""
required
Database name
DATABASE__USER
string
default:""
required
Database username
DATABASE__PASSWORD
string
default:""
required
Database password
DATABASE__URL
string
Full database connection URL. If not provided, it’s auto-constructed from individual parameters.
DATABASE__POOL_SIZE
integer
default:"10"
SQLAlchemy connection pool size
DATABASE__MAX_OVERFLOW
integer
default:"20"
Maximum overflow connections beyond pool_size
DATABASE__ECHO
boolean
default:"false"
Enable SQL query logging (useful for debugging)

LLM Configuration

Configure OpenAI-compatible LLM providers:
# Primary LLM
LLM__OPENAI_API_KEY=your-api-key-here
LLM__OPENAI_API_BASE=https://openrouter.ai/api/v1
LLM__MODEL_NAME=openai/gpt-oss-20b
LLM__REQUEST_TIMEOUT=120

# Alternative LLM models
LLM_SMALL1__OPENAI_API_KEY=your-api-key-here
LLM_SMALL2__OPENAI_API_KEY=your-api-key-here
EMBEDDINGS__OPENAI_API_KEY=your-api-key-here
LLM__OPENAI_API_KEY
string
default:""
required
API key for OpenAI-compatible service (e.g., OpenRouter, vLLM)
LLM__OPENAI_API_BASE
string
default:"https://openrouter.ai/api/v1"
Base URL for the LLM API endpoint
LLM__MODEL_NAME
string
default:"openai/gpt-oss-20b"
Model identifier to use for requests
LLM__REQUEST_TIMEOUT
integer
default:"120"
Request timeout in seconds

Langfuse Tracing

Configure Langfuse for LLM observability and tracing:
LANGFUSE__HOST=https://cloud.langfuse.com
LANGFUSE__SECRET_KEY=sk-lf-...
LANGFUSE__PUBLIC_KEY=pk-lf-...
LANGFUSE__TRACING_ENABLED=true
LANGFUSE__ENVIRONMENT=dev-myname
LANGFUSE__TRACING_ENABLED
boolean
default:"false"
Enable/disable Langfuse tracing
LANGFUSE__SECRET_KEY
string
default:"emptykey"
Langfuse project secret key
LANGFUSE__PUBLIC_KEY
string
default:"emptykey"
Langfuse project public key
LANGFUSE__HOST
string
default:""
Langfuse host URL
LANGFUSE__ENVIRONMENT
string
default:"dev-whoami"
Environment identifier for traces (e.g., dev-john, staging, production)

Search Configuration

Configure web search capabilities:
SEARCH__METHOD_NAME=ddgs

# Brave Search (alternative)
BRAVE__API_KEY=your-brave-api-key
BRAVE__ENDPOINT=https://api.search.brave.com/res/v1/web/search
SEARCH__METHOD_NAME
string
default:""
Search method: ddgs (DuckDuckGo) or brave
BRAVE__API_KEY
string
default:""
Brave Search API key (required if using Brave)

Scrape and Process Settings

Configure web scraping and content processing:
# Scraper controls
SCRAPE_AND_PROCESS__MAX_SCRAPER_ITERATIONS=4
SCRAPE_AND_PROCESS__MAX_ITERATIONS_FILTER_IRRELEVANT=5

# Chunking controls
SCRAPE_AND_PROCESS__CHUNK_LENGTH=20000
SCRAPE_AND_PROCESS__MAX_CONCURRENT=20

# Metadata thresholds
SCRAPE_AND_PROCESS__UNKNOWN_VERSION_THRESHOLD=0.9
SCRAPE_AND_PROCESS__METADATA_UNCERTAINTY_THRESHOLD=0.05
SCRAPE_AND_PROCESS__MAX_SCRAPER_ITERATIONS
integer
default:"4"
Maximum outer iterations of the scraper loop
SCRAPE_AND_PROCESS__CHUNK_LENGTH
integer
default:"20000"
Maximum tokens per chunk for LLM processing
SCRAPE_AND_PROCESS__MAX_CONCURRENT
integer
default:"20"
Maximum concurrent chunk processing tasks

Configuration Best Practices

Use .env for local development

Keep local settings in .env (git-ignored)

Use environment variables in production

Set environment variables directly in Docker/Kubernetes

Never commit secrets

Keep API keys and passwords out of version control

Document custom settings

Add comments to your .env file for team clarity

Validation and Loading

The configuration is validated on application startup using Pydantic. If required fields are missing or invalid, the application will fail to start with a clear error message:
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
database.user
  Field required [type=missing, input_value={}, input_type=dict]

Viewing Current Configuration

To inspect the current configuration at runtime, access the /health endpoint or check application logs during startup.

Next Steps

Database Setup

Configure and initialize PostgreSQL

Docker Setup

Deploy with Docker Compose

Build docs developers (and LLMs) love