Configuration

The midPilot Connector Generator uses environment variables for configuration. All settings are defined in src/config.py and can be overridden using a .env file or environment variables.

Configuration File Setup

Copy the example configuration

cp .env-example .env

Edit the configuration

Open .env in your editor and configure the required variables:

nano .env
# or
vim .env

For testing environments

Create a separate test configuration:

cp .env.test-example .env.test

Environment Variable Naming

Configuration uses nested environment variables with double underscore (__) delimiter:

# Pattern: SECTION__PARAMETER=value
APP__TITLE="My Custom Title"
LOGGING__LEVEL=debug
DATABASE__HOST=localhost
DATABASE__PORT=5432

Core Configuration Sections

Application Settings

Configure the FastAPI application and Uvicorn server:

# Application metadata
APP__TITLE="Smart Integration Microservice (DEV)"
APP__VERSION="0.1.0"
APP__DESCRIPTION="Smart Integration Microservice for scraping, digester and CodeGen"

# Server configuration
APP__HOST=0.0.0.0
APP__PORT=8090
APP__LIVE_RELOAD=true
APP__WORKERS=1

# Proxy and forwarding
APP__PROXY_HEADERS=true
APP__FORWARDED_ALLOW_IPS=*

# Timeouts
APP__TIMEOUT_KEEP_ALIVE=10
APP__TIMEOUT_GRACEFUL_SHUTDOWN=15

APP__TITLE

string

default:"Smart Integration Microservice"

API title displayed in documentation

APP__PORT

integer

default:"8090"

Port number for the API server

APP__LIVE_RELOAD

boolean

default:"false"

Enable hot reload during development

APP__WORKERS

integer

default:"1"

Number of Uvicorn worker processes

Logging Configuration

Control logging behavior and verbosity:

LOGGING__LEVEL=info
LOGGING__ACCESS_LOG=true
LOGGING__COLORS=true

LOGGING__LEVEL

enum

default:"info"

Log level: debug, info, warning, error, or critical

LOGGING__ACCESS_LOG

boolean

default:"true"

Enable HTTP access logs

LOGGING__COLORS

boolean

default:"false"

Enable colored log output (useful for development)

Database Configuration

Configure PostgreSQL database connection:

# Individual connection parameters
DATABASE__HOST=localhost
DATABASE__PORT=5432
DATABASE__NAME=db
DATABASE__USER=user
DATABASE__PASSWORD=password

# Full connection URL (auto-constructed if not provided)
DATABASE__URL=postgresql+asyncpg://${DATABASE__USER}:${DATABASE__PASSWORD}@${DATABASE__HOST}:${DATABASE__PORT}/${DATABASE__NAME}

# Connection pool settings
DATABASE__POOL_SIZE=10
DATABASE__MAX_OVERFLOW=20
DATABASE__ECHO=false

DATABASE__HOST

string

default:""

Database server hostname or IP address

DATABASE__PORT

integer

default:"5432"

PostgreSQL port number

DATABASE__NAME

string

default:""

required

Database name

DATABASE__USER

string

default:""

required

Database username

DATABASE__PASSWORD

string

default:""

required

Database password

DATABASE__URL

string

Full database connection URL. If not provided, it’s auto-constructed from individual parameters.

DATABASE__POOL_SIZE

integer

default:"10"

SQLAlchemy connection pool size

DATABASE__MAX_OVERFLOW

integer

default:"20"

Maximum overflow connections beyond pool_size

DATABASE__ECHO

boolean

default:"false"

Enable SQL query logging (useful for debugging)

LLM Configuration

Configure OpenAI-compatible LLM providers:

# Primary LLM
LLM__OPENAI_API_KEY=your-api-key-here
LLM__OPENAI_API_BASE=https://openrouter.ai/api/v1
LLM__MODEL_NAME=openai/gpt-oss-20b
LLM__REQUEST_TIMEOUT=120

# Alternative LLM models
LLM_SMALL1__OPENAI_API_KEY=your-api-key-here
LLM_SMALL2__OPENAI_API_KEY=your-api-key-here
EMBEDDINGS__OPENAI_API_KEY=your-api-key-here

LLM__OPENAI_API_KEY

string

default:""

required

API key for OpenAI-compatible service (e.g., OpenRouter, vLLM)

LLM__OPENAI_API_BASE

string

default:"https://openrouter.ai/api/v1"

Base URL for the LLM API endpoint

LLM__MODEL_NAME

string

default:"openai/gpt-oss-20b"

Model identifier to use for requests

LLM__REQUEST_TIMEOUT

integer

default:"120"

Request timeout in seconds

Langfuse Tracing

Configure Langfuse for LLM observability and tracing:

LANGFUSE__HOST=https://cloud.langfuse.com
LANGFUSE__SECRET_KEY=sk-lf-...
LANGFUSE__PUBLIC_KEY=pk-lf-...
LANGFUSE__TRACING_ENABLED=true
LANGFUSE__ENVIRONMENT=dev-myname

LANGFUSE__TRACING_ENABLED

boolean

default:"false"

Enable/disable Langfuse tracing

LANGFUSE__SECRET_KEY

string

default:"emptykey"

Langfuse project secret key

LANGFUSE__PUBLIC_KEY

string

default:"emptykey"

Langfuse project public key

LANGFUSE__HOST

string

default:""

Langfuse host URL

LANGFUSE__ENVIRONMENT

string

default:"dev-whoami"

Environment identifier for traces (e.g., dev-john, staging, production)

Search Configuration

Configure web search capabilities:

SEARCH__METHOD_NAME=ddgs

# Brave Search (alternative)
BRAVE__API_KEY=your-brave-api-key
BRAVE__ENDPOINT=https://api.search.brave.com/res/v1/web/search

SEARCH__METHOD_NAME

string

default:""

Search method: ddgs (DuckDuckGo) or brave

BRAVE__API_KEY

string

default:""

Brave Search API key (required if using Brave)

Scrape and Process Settings

Configure web scraping and content processing:

# Scraper controls
SCRAPE_AND_PROCESS__MAX_SCRAPER_ITERATIONS=4
SCRAPE_AND_PROCESS__MAX_ITERATIONS_FILTER_IRRELEVANT=5

# Chunking controls
SCRAPE_AND_PROCESS__CHUNK_LENGTH=20000
SCRAPE_AND_PROCESS__MAX_CONCURRENT=20

# Metadata thresholds
SCRAPE_AND_PROCESS__UNKNOWN_VERSION_THRESHOLD=0.9
SCRAPE_AND_PROCESS__METADATA_UNCERTAINTY_THRESHOLD=0.05

SCRAPE_AND_PROCESS__MAX_SCRAPER_ITERATIONS

integer

default:"4"

Maximum outer iterations of the scraper loop

SCRAPE_AND_PROCESS__CHUNK_LENGTH

integer

default:"20000"

Maximum tokens per chunk for LLM processing

SCRAPE_AND_PROCESS__MAX_CONCURRENT

integer

default:"20"

Maximum concurrent chunk processing tasks

Configuration Best Practices

Use .env for local development

Keep local settings in .env (git-ignored)

Use environment variables in production

Set environment variables directly in Docker/Kubernetes

Never commit secrets

Keep API keys and passwords out of version control

Document custom settings

Add comments to your .env file for team clarity

Validation and Loading

The configuration is validated on application startup using Pydantic. If required fields are missing or invalid, the application will fail to start with a clear error message:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
database.user
  Field required [type=missing, input_value={}, input_type=dict]

Viewing Current Configuration

To inspect the current configuration at runtime, access the /health endpoint or check application logs during startup.

Get Started

Core Concepts

Guides

Configuration File Setup

Environment Variable Naming

Core Configuration Sections

Application Settings

Logging Configuration

Database Configuration

LLM Configuration

Langfuse Tracing

Search Configuration

Scrape and Process Settings

Configuration Best Practices

Use .env for local development

Use environment variables in production

Never commit secrets

Document custom settings

Validation and Loading

Viewing Current Configuration

Next Steps

Database Setup

Docker Setup

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Configuration File Setup

​Environment Variable Naming

​Core Configuration Sections

​Application Settings

​Logging Configuration

​Database Configuration

​LLM Configuration

​Langfuse Tracing

​Search Configuration

​Scrape and Process Settings

​Configuration Best Practices

Use .env for local development

Use environment variables in production

Never commit secrets

Document custom settings

​Validation and Loading

​Viewing Current Configuration

​Next Steps

Database Setup

Docker Setup

Build docs developers (and LLMs) love

Configuration File Setup

Environment Variable Naming

Core Configuration Sections

Application Settings

Logging Configuration

Database Configuration

LLM Configuration

Langfuse Tracing

Search Configuration

Scrape and Process Settings

Configuration Best Practices

Validation and Loading

Viewing Current Configuration

Next Steps