Skip to main content
InterviewGuide uses PostgreSQL with the pgvector extension for storing both relational data and vector embeddings. This page covers database setup, connection configuration, and schema management.

Overview

The database serves two critical functions:
  1. Relational Storage - User data, resumes, interview sessions, and metadata
  2. Vector Storage - Document embeddings for RAG (Retrieval-Augmented Generation) knowledge base
The pgvector extension is required. Use the pgvector/pgvector:pg16 Docker image or install the extension manually.

Quick Setup

The Docker Compose setup handles everything automatically:
docker-compose up -d postgres
This creates:
  • PostgreSQL 16 with pgvector extension
  • Database named interview_guide
  • Automatic extension initialization
  • Data persistence in postgres_data volume

Manual Setup

If you’re running PostgreSQL locally:
1

Install pgvector extension

Follow the pgvector installation guide for your platform.
2

Create database

CREATE DATABASE interview_guide;
3

Enable pgvector

\c interview_guide
CREATE EXTENSION IF NOT EXISTS vector;
4

Verify installation

SELECT * FROM pg_extension WHERE extname = 'vector';

Connection Configuration

Spring Boot Properties

Database connection is configured in application.yml using environment variable substitution:
spring:
  datasource:
    url: jdbc:postgresql://${POSTGRES_HOST:localhost}:${POSTGRES_PORT:5432}/${POSTGRES_DB:interview_guide}
    username: ${POSTGRES_USER:postgres}
    password: ${POSTGRES_PASSWORD:123456}
    driver-class-name: org.postgresql.Driver

Environment Variables

POSTGRES_HOST
string
default:"localhost"
PostgreSQL server hostname or IP address.Docker: Use the service name postgres for container-to-container communication.Local: Use localhost or 127.0.0.1.
POSTGRES_PORT
integer
default:"5432"
PostgreSQL server port number.
POSTGRES_DB
string
default:"interview_guide"
Database name. Must exist before starting the application.
POSTGRES_USER
string
default:"postgres"
Database username for authentication.
POSTGRES_PASSWORD
string
default:"123456"
Database password for authentication.
Change the default password in production environments!

JPA and Hibernate Configuration

Schema Management

The ddl-auto setting controls how Hibernate manages the database schema:
spring:
  jpa:
    hibernate:
      ddl-auto: create  # Change to 'update' after first run
    show-sql: false
    properties:
      hibernate:
        dialect: org.hibernate.dialect.PostgreSQLDialect
        format_sql: true
spring.jpa.hibernate.ddl-auto
string
default:"create"
Schema generation strategy.Options:
  • create - Drop and recreate tables on startup. Use only for first run or development.
  • update - Update schema to match entities without dropping data. Recommended for production.
  • validate - Validate schema matches entities, don’t make changes. Safest for production.
  • none - Do nothing. Requires manual schema management.
Using create in production will delete all data on application restart. Switch to update or validate after initial setup.
spring.jpa.show-sql
boolean
default:"false"
Log all SQL statements to the console.Useful for debugging but verbose in production. Set to true during development.
spring.jpa.properties.hibernate.dialect
string
default:"org.hibernate.dialect.PostgreSQLDialect"
SQL dialect for PostgreSQL-specific features.This enables PostgreSQL optimizations and native data types.
spring.jpa.properties.hibernate.format_sql
boolean
default:"true"
Pretty-print SQL statements in logs when show-sql is enabled.

Development vs Production

spring:
  jpa:
    hibernate:
      ddl-auto: create  # Or update after initial setup
    show-sql: true

Vector Store Configuration

Spring AI’s pgvector integration is configured for the RAG knowledge base:
spring:
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1024
        initialize-schema: true
        remove-existing-vector-store-table: false

Vector Store Properties

spring.ai.vectorstore.pgvector.index-type
string
default:"HNSW"
Vector index type for similarity search.Options:
  • HNSW - Hierarchical Navigable Small World (recommended). Fast approximate nearest neighbor search.
  • IVFFlat - Inverted File Flat index. Good for smaller datasets.
HNSW provides better performance for most use cases.
spring.ai.vectorstore.pgvector.distance-type
string
default:"COSINE_DISTANCE"
Distance metric for vector similarity.Options:
  • COSINE_DISTANCE - Cosine similarity (recommended for text embeddings)
  • EUCLIDEAN_DISTANCE - L2 distance
  • INNER_PRODUCT - Dot product similarity
Cosine distance works best with normalized embeddings from text-embedding models.
spring.ai.vectorstore.pgvector.dimensions
integer
default:"1024"
Vector dimensionality. Must match the embedding model output.For Aliyun’s text-embedding-v3 model, this is 1024 dimensions.
Changing this value requires recreating the vector store table.
spring.ai.vectorstore.pgvector.initialize-schema
boolean
default:"true"
Automatically create vector store tables on startup.Development: Set to true for automatic setup.Production: Set to false and manage schema manually to prevent unexpected changes.
spring.ai.vectorstore.pgvector.remove-existing-vector-store-table
boolean
default:"false"
Drop and recreate vector store table on startup.
Setting this to true will delete all embeddings on restart. Only use during development or data migration.

Connection Pooling

Spring Boot uses HikariCP for connection pooling. The default configuration from application.yml uses Redisson settings, but you can also tune HikariCP:
spring:
  datasource:
    hikari:
      maximum-pool-size: 10
      minimum-idle: 5
      connection-timeout: 20000
      idle-timeout: 300000
      max-lifetime: 1200000
maximum-pool-size
integer
default:"10"
Maximum number of connections in the pool.Formula: connections = ((core_count * 2) + effective_spindle_count)For most applications, 10-20 connections is sufficient.
minimum-idle
integer
default:"5"
Minimum number of idle connections maintained.Set to same as maximum-pool-size for fixed-size pools.
connection-timeout
integer
default:"20000"
Maximum milliseconds to wait for a connection from the pool.
idle-timeout
integer
default:"300000"
Maximum milliseconds a connection can sit idle (5 minutes).
max-lifetime
integer
default:"1200000"
Maximum lifetime of a connection in the pool (20 minutes).Should be shorter than database connection timeout.

Database Initialization

The Docker setup includes an initialization script that runs on first startup:
-- docker/postgres/init.sql
CREATE EXTENSION IF NOT EXISTS vector;
This is executed automatically when the PostgreSQL container starts for the first time. The script is mounted at /docker-entrypoint-initdb.d/init.sql in the container.

Docker Compose Configuration

postgres:
  image: pgvector/pgvector:pg16
  container_name: interview-postgres
  environment:
    POSTGRES_USER: postgres
    POSTGRES_PASSWORD: password
    POSTGRES_DB: interview_guide
  volumes:
    - postgres_data:/var/lib/postgresql/data
    - ./docker/postgres/init.sql:/docker-entrypoint-initdb.d/init.sql
  ports:
    - "5432:5432"
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U postgres"]
    interval: 5s
    timeout: 5s
    retries: 5

Manual SQL Setup

If you’re not using Docker, run these SQL commands:
CREATE DATABASE interview_guide
  WITH
  ENCODING = 'UTF8'
  LC_COLLATE = 'en_US.UTF-8'
  LC_CTYPE = 'en_US.UTF-8'
  TEMPLATE = template0;

Production Checklist

1

Use strong credentials

Change default POSTGRES_PASSWORD to a strong, unique password.
2

Set ddl-auto to validate

Never use create or update in production. Use validate or none.
3

Disable schema auto-initialization

Set spring.ai.vectorstore.pgvector.initialize-schema: false.
4

Configure connection pooling

Tune HikariCP settings based on your load.
5

Enable SSL connections

Add ?ssl=true&sslmode=require to the JDBC URL for encrypted connections.
6

Set up backups

Implement regular database backups using pg_dump or your hosting provider’s tools.
7

Monitor performance

Use pg_stat_statements and connection pool metrics to track database performance.

Troubleshooting

The vector store table hasn’t been created. Solutions:
  1. Set spring.ai.vectorstore.pgvector.initialize-schema: true
  2. Or manually create the table using Spring AI’s schema
  3. Check that pgvector extension is installed: SELECT * FROM pg_extension WHERE extname = 'vector';
The pgvector extension isn’t installed:
  1. Use the pgvector/pgvector:pg16 Docker image, OR
  2. Install pgvector manually following the official guide
  3. Run CREATE EXTENSION IF NOT EXISTS vector; in your database
PostgreSQL isn’t reachable:
  1. Verify PostgreSQL is running: docker ps or systemctl status postgresql
  2. Check firewall rules allow port 5432
  3. Verify POSTGRES_HOST matches your setup (use postgres in Docker, localhost locally)
  4. Check database logs for startup errors
You’re using ddl-auto: create:
  1. Change to ddl-auto: update in application.yml
  2. Restart the application
  3. Consider using validate in production
Possible causes:
  1. Documents haven’t been embedded yet - upload files to knowledge base
  2. Wrong embedding dimensions - verify dimensions: 1024 matches your model
  3. Distance threshold too strict - check min-score settings in RAG configuration
  4. Index not built - allow time for HNSW index creation on large datasets

See Also

Build docs developers (and LLMs) love