Database Configuration

InterviewGuide uses PostgreSQL with the pgvector extension for storing both relational data and vector embeddings. This page covers database setup, connection configuration, and schema management.

Overview

The database serves two critical functions:

Relational Storage - User data, resumes, interview sessions, and metadata
Vector Storage - Document embeddings for RAG (Retrieval-Augmented Generation) knowledge base

The pgvector extension is required. Use the pgvector/pgvector:pg16 Docker image or install the extension manually.

Quick Setup

Using Docker (Recommended)

The Docker Compose setup handles everything automatically:

docker-compose up -d postgres

This creates:

PostgreSQL 16 with pgvector extension
Database named interview_guide
Automatic extension initialization
Data persistence in postgres_data volume

Manual Setup

If you’re running PostgreSQL locally:

Install pgvector extension

Follow the pgvector installation guide for your platform.

Create database

CREATE DATABASE interview_guide;

Enable pgvector

\c interview_guide
CREATE EXTENSION IF NOT EXISTS vector;

Verify installation

SELECT * FROM pg_extension WHERE extname = 'vector';

Connection Configuration

Spring Boot Properties

Database connection is configured in application.yml using environment variable substitution:

spring:
  datasource:
    url: jdbc:postgresql://${POSTGRES_HOST:localhost}:${POSTGRES_PORT:5432}/${POSTGRES_DB:interview_guide}
    username: ${POSTGRES_USER:postgres}
    password: ${POSTGRES_PASSWORD:123456}
    driver-class-name: org.postgresql.Driver

Environment Variables

POSTGRES_HOST

string

default:"localhost"

PostgreSQL server hostname or IP address.Docker: Use the service name postgres for container-to-container communication.Local: Use localhost or 127.0.0.1.

POSTGRES_PORT

integer

default:"5432"

PostgreSQL server port number.

POSTGRES_DB

string

default:"interview_guide"

Database name. Must exist before starting the application.

POSTGRES_USER

string

default:"postgres"

Database username for authentication.

POSTGRES_PASSWORD

string

default:"123456"

Database password for authentication.

Change the default password in production environments!

JPA and Hibernate Configuration

Schema Management

The ddl-auto setting controls how Hibernate manages the database schema:

spring:
  jpa:
    hibernate:
      ddl-auto: create  # Change to 'update' after first run
    show-sql: false
    properties:
      hibernate:
        dialect: org.hibernate.dialect.PostgreSQLDialect
        format_sql: true

spring.jpa.hibernate.ddl-auto

string

default:"create"

Schema generation strategy.Options:

create - Drop and recreate tables on startup. Use only for first run or development.
update - Update schema to match entities without dropping data. Recommended for production.
validate - Validate schema matches entities, don’t make changes. Safest for production.
none - Do nothing. Requires manual schema management.

Using create in production will delete all data on application restart. Switch to update or validate after initial setup.

spring.jpa.show-sql

boolean

default:"false"

Log all SQL statements to the console.Useful for debugging but verbose in production. Set to true during development.

spring.jpa.properties.hibernate.dialect

string

default:"org.hibernate.dialect.PostgreSQLDialect"

SQL dialect for PostgreSQL-specific features.This enables PostgreSQL optimizations and native data types.

spring.jpa.properties.hibernate.format_sql

boolean

default:"true"

Pretty-print SQL statements in logs when show-sql is enabled.

Development vs Production

spring:
  jpa:
    hibernate:
      ddl-auto: create  # Or update after initial setup
    show-sql: true

Vector Store Configuration

Spring AI’s pgvector integration is configured for the RAG knowledge base:

spring:
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1024
        initialize-schema: true
        remove-existing-vector-store-table: false

Vector Store Properties

spring.ai.vectorstore.pgvector.index-type

string

default:"HNSW"

Vector index type for similarity search.Options:

HNSW - Hierarchical Navigable Small World (recommended). Fast approximate nearest neighbor search.
IVFFlat - Inverted File Flat index. Good for smaller datasets.

HNSW provides better performance for most use cases.

spring.ai.vectorstore.pgvector.distance-type

string

default:"COSINE_DISTANCE"

Distance metric for vector similarity.Options:

COSINE_DISTANCE - Cosine similarity (recommended for text embeddings)
EUCLIDEAN_DISTANCE - L2 distance
INNER_PRODUCT - Dot product similarity

Cosine distance works best with normalized embeddings from text-embedding models.

spring.ai.vectorstore.pgvector.dimensions

integer

default:"1024"

Vector dimensionality. Must match the embedding model output.For Aliyun’s text-embedding-v3 model, this is 1024 dimensions.

Changing this value requires recreating the vector store table.

spring.ai.vectorstore.pgvector.initialize-schema

boolean

default:"true"

Automatically create vector store tables on startup.Development: Set to true for automatic setup.Production: Set to false and manage schema manually to prevent unexpected changes.

spring.ai.vectorstore.pgvector.remove-existing-vector-store-table

boolean

default:"false"

Drop and recreate vector store table on startup.

Setting this to true will delete all embeddings on restart. Only use during development or data migration.

Connection Pooling

Spring Boot uses HikariCP for connection pooling. The default configuration from application.yml uses Redisson settings, but you can also tune HikariCP:

spring:
  datasource:
    hikari:
      maximum-pool-size: 10
      minimum-idle: 5
      connection-timeout: 20000
      idle-timeout: 300000
      max-lifetime: 1200000

Connection Pool Tuning

maximum-pool-size

integer

default:"10"

Maximum number of connections in the pool.Formula: connections = ((core_count * 2) + effective_spindle_count)For most applications, 10-20 connections is sufficient.

minimum-idle

integer

default:"5"

Minimum number of idle connections maintained.Set to same as maximum-pool-size for fixed-size pools.

connection-timeout

integer

default:"20000"

Maximum milliseconds to wait for a connection from the pool.

idle-timeout

integer

default:"300000"

Maximum milliseconds a connection can sit idle (5 minutes).

max-lifetime

integer

default:"1200000"

Maximum lifetime of a connection in the pool (20 minutes).Should be shorter than database connection timeout.

Database Initialization

The Docker setup includes an initialization script that runs on first startup:

-- docker/postgres/init.sql
CREATE EXTENSION IF NOT EXISTS vector;

This is executed automatically when the PostgreSQL container starts for the first time. The script is mounted at /docker-entrypoint-initdb.d/init.sql in the container.

Docker Compose Configuration

postgres:
  image: pgvector/pgvector:pg16
  container_name: interview-postgres
  environment:
    POSTGRES_USER: postgres
    POSTGRES_PASSWORD: password
    POSTGRES_DB: interview_guide
  volumes:
    - postgres_data:/var/lib/postgresql/data
    - ./docker/postgres/init.sql:/docker-entrypoint-initdb.d/init.sql
  ports:
    - "5432:5432"
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U postgres"]
    interval: 5s
    timeout: 5s
    retries: 5

Manual SQL Setup

If you’re not using Docker, run these SQL commands:

CREATE DATABASE interview_guide
  WITH
  ENCODING = 'UTF8'
  LC_COLLATE = 'en_US.UTF-8'
  LC_CTYPE = 'en_US.UTF-8'
  TEMPLATE = template0;

Production Checklist

Use strong credentials

Change default POSTGRES_PASSWORD to a strong, unique password.

Set ddl-auto to validate

Never use create or update in production. Use validate or none.

Disable schema auto-initialization

Set spring.ai.vectorstore.pgvector.initialize-schema: false.

Configure connection pooling

Tune HikariCP settings based on your load.

Enable SSL connections

Add ?ssl=true&sslmode=require to the JDBC URL for encrypted connections.

Set up backups

Implement regular database backups using pg_dump or your hosting provider’s tools.

Monitor performance

Use pg_stat_statements and connection pool metrics to track database performance.

Troubleshooting

Error: relation 'vector_store' does not exist

The vector store table hasn’t been created. Solutions:

Set spring.ai.vectorstore.pgvector.initialize-schema: true
Or manually create the table using Spring AI’s schema
Check that pgvector extension is installed: SELECT * FROM pg_extension WHERE extname = 'vector';

Error: extension 'vector' does not exist

The pgvector extension isn’t installed:

Use the pgvector/pgvector:pg16 Docker image, OR
Install pgvector manually following the official guide
Run CREATE EXTENSION IF NOT EXISTS vector; in your database

Connection refused / timeout

PostgreSQL isn’t reachable:

Verify PostgreSQL is running: docker ps or systemctl status postgresql
Check firewall rules allow port 5432
Verify POSTGRES_HOST matches your setup (use postgres in Docker, localhost locally)
Check database logs for startup errors

All data deleted on restart

You’re using ddl-auto: create:

Change to ddl-auto: update in application.yml
Restart the application
Consider using validate in production

Vector search returns no results

Possible causes:

Documents haven’t been embedded yet - upload files to knowledge base
Wrong embedding dimensions - verify dimensions: 1024 matches your model
Distance threshold too strict - check min-score settings in RAG configuration
Index not built - allow time for HNSW index creation on large datasets

Getting Started

Core Features

Architecture

Configuration

Deployment

Overview

Quick Setup

Using Docker (Recommended)

Manual Setup

Connection Configuration

Spring Boot Properties

Environment Variables

JPA and Hibernate Configuration

Schema Management

Development vs Production

Vector Store Configuration

Vector Store Properties

Connection Pooling

Database Initialization

Docker Compose Configuration

Manual SQL Setup

Production Checklist

Troubleshooting

See Also

Build docs developers (and LLMs) love

Getting Started

Core Features

Architecture

Configuration

Deployment

​Overview

​Quick Setup

​Using Docker (Recommended)

​Manual Setup

​Connection Configuration

​Spring Boot Properties

​Environment Variables

​JPA and Hibernate Configuration

​Schema Management

​Development vs Production

​Vector Store Configuration

​Vector Store Properties

​Connection Pooling

​Database Initialization

​Docker Compose Configuration

​Manual SQL Setup

​Production Checklist

​Troubleshooting

​See Also

Build docs developers (and LLMs) love

Overview

Quick Setup

Using Docker (Recommended)

Manual Setup

Connection Configuration

Spring Boot Properties

Environment Variables

JPA and Hibernate Configuration

Schema Management

Development vs Production

Vector Store Configuration

Vector Store Properties

Connection Pooling

Database Initialization

Docker Compose Configuration

Manual SQL Setup

Production Checklist

Troubleshooting

See Also