Vector Stores

MySQL MCP Server provides tools for loading documents into vector stores and performing semantic search with Retrieval-Augmented Generation (RAG).

Overview

Vector stores enable semantic search by storing text embeddings as VECTOR columns. The server supports:

Document Loading: Ingest documents from local filesystem or OCI Object Storage
Vector Embeddings: Automatically generate embeddings using ragify_column
RAG Queries: Natural language queries that retrieve relevant context and generate answers
Dual Storage: Default vector store or custom InnoDB tables

Loading Documents

From Local Filesystem (MySQL AI)

For MySQL AI connections, load documents from the secure_file_priv directory.

1. List Available Files

"List local files for vector store ingestion"

Tool: list_vector_store_files_local

2. Load Documents

"Load documents from /var/lib/mysql-files/product_manuals.pdf"

Tool: load_vector_store_local Parameters:

connection_id: Database connection
file_path: Path within secure_file_priv directory

Supported Formats:

PDF documents
Text files (.txt, .md)
HTML files

The load_vector_store_local tool only works with MySQL AI connections. It will fail with a clear error message on HeatWave connections.

From OCI Object Storage (MySQL HeatWave)

For MySQL HeatWave connections, load documents from OCI Object Storage buckets.

1. List Buckets and Objects

"List all buckets in the production compartment"
"List objects in my 'documentation' bucket"

Tools: object_storage_list_buckets, object_storage_list_objects

2. Load Documents with Prefix Filter

"Load all documents with prefix 'manuals/' from bucket 'company-docs' namespace 'mytenancy' into vector store"

Tool: load_vector_store_oci Parameters:

connection_id: HeatWave database connection
namespace: OCI Object Storage namespace
bucket_name: Source bucket name
document_prefix: Prefix to filter objects (e.g., “manuals/”, “docs/2024/”)

Example:

{
  "connection_id": "prod_heatwave",
  "namespace": "mycompany",
  "bucket_name": "knowledge-base",
  "document_prefix": "support/faq/"
}

The load_vector_store_oci tool only works with MySQL HeatWave connections. For MySQL AI, use load_vector_store_local.

Requires a valid OCI configuration file (~/.oci/config). See Configuration for details.

Creating Vector Embeddings

Using ragify_column

Convert text columns into vector embeddings for semantic search.

"Add embeddings for 'description' column into 'description_vec' column in products table"

Tool: ragify_column Parameters:

connection_id: Database connection
table_name: Target table
input_column_name: Source text column
embedding_column_name: Target VECTOR column (created if doesn’t exist)

What It Does:

Creates the embedding column if it doesn’t exist (as VECTOR type)
Generates embeddings for all rows in the text column
Populates the VECTOR column with embeddings
Returns the number of rows processed

Example Workflow:

-- Create a table with text content
CREATE TABLE product_docs (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(255),
  body TEXT
);

-- Insert documents
INSERT INTO product_docs (title, body) VALUES
  ('Installation Guide', 'Follow these steps to install...'),
  ('Troubleshooting', 'If you encounter errors...');

Then use ragify_column:

"Add embeddings for 'body' column into 'body_embedding' column in product_docs table"

Result:

-- Table now has:
CREATE TABLE product_docs (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(255),
  body TEXT,
  body_embedding VECTOR  -- Added automatically
);

The ragify_column tool works on both MySQL AI and HeatWave connections with identical behavior.

Performing RAG Queries

Query Default Vector Store

Use the default vector store created by document loading tools.

"Ask ml_rag: What is the refund policy?"
"Show me information about installation requirements from the vector store"

Tool: ask_ml_rag_vector_store Parameters:

connection_id: Database connection
question: Natural language question
context_size (optional): Number of context chunks to retrieve

How It Works:

Converts your question into a vector embedding
Performs similarity search in the vector store
Retrieves the top N most relevant text segments
Generates an answer using retrieved context and a language model

Example:

User: "Ask ml_rag: What payment methods are supported?"

Server:
1. Searches vector store for relevant segments
2. Finds: "We accept credit cards, PayPal, and bank transfers..."
3. Generates: "According to the documentation, supported payment methods include credit cards, PayPal, and bank transfers."

Query Custom InnoDB Tables

Perform RAG queries on specific InnoDB tables with custom columns.

"Search my product_docs table using 'body' and 'body_embedding' columns for warranty information"

Tool: ask_ml_rag_innodb Parameters:

connection_id: Database connection
question: Natural language question
segment_col: Column containing text segments
embedding_col: Column containing VECTOR embeddings
context_size (optional): Number of context chunks

When to Use:

You have multiple tables with embeddings
You want to restrict search to a specific table
You need fine-grained control over which embeddings to search

Performance Considerations:

MySQL AI: Vector search executes within the database service instance
HeatWave: Table is loaded into the HeatWave cluster for distributed processing (much faster for large datasets)

Retrieve Segments Without Generation

Get raw text segments without LLM-generated answers.

"Retrieve relevant segments about pricing from vector store without generating an answer"

Tool: ask_ml_rag_vector_store (with skip_generate parameter) Use Cases:

Debugging vector search results
Building custom generation pipelines
Analyzing retrieved context quality

Vector Store Architecture

MySQL AI

┌─────────────────┐
│  MySQL AI       │
│  Database       │
│                 │
│  ┌───────────┐  │
│  │  InnoDB   │  │  Vector operations
│  │  Tables   │  │  within database
│  │           │  │  service instance
│  │  VECTOR   │  │
│  │  columns  │  │
│  └───────────┘  │
└─────────────────┘
       ↑
       │ Load from
       │ local filesystem
       │
┌──────────────────┐
│ secure_file_priv │
│   directory      │
└──────────────────┘

MySQL HeatWave

┌─────────────────────────────┐
│  MySQL HeatWave Cluster     │
│                             │
│  ┌───────────────────────┐  │
│  │  Distributed          │  │
│  │  Vector Processing    │  │
│  │                       │  │
│  │  In-Memory Analytics  │  │
│  └───────────────────────┘  │
└─────────────────────────────┘
       ↑
       │ Load from
       │ Object Storage
       │
┌──────────────────┐
│  OCI Object      │
│  Storage         │
│                  │
│  Buckets/Objects │
└──────────────────┘

Complete Workflow Example

MySQL AI Workflow

# Step 1: Check available files
"List local files for vector store ingestion"

# Step 2: Load documents
"Load documents from /var/lib/mysql-files/support_articles.pdf"

# Step 3: Create custom embeddings (optional)
"Add embeddings for 'content' column into 'content_vec' column in articles table"

# Step 4: Query the vector store
"Ask ml_rag: How do I reset my password?"

# Step 5: Query custom table
"Search articles table using 'content' and 'content_vec' for password reset instructions"

MySQL HeatWave Workflow

# Step 1: List OCI resources
"List all compartments in my tenancy"
"Show all buckets in the production compartment"
"List objects in the 'knowledge-base' bucket"

# Step 2: Load from Object Storage
"Load all documents with prefix 'kb/2024/' from bucket 'knowledge-base' namespace 'mycompany'"

# Step 3: Query the vector store
"Ask ml_rag: What are the new features in version 2.0?"

# Step 4: Create custom embeddings for existing tables
"Add embeddings for 'faq_answer' column into 'faq_vec' column in faqs table"

# Step 5: Query custom table
"Search faqs table using 'faq_answer' and 'faq_vec' for billing questions"

Best Practices

Document Preparation

Structure: Use clear headings and sections
Chunk Size: Keep paragraphs focused (200-500 words)
Metadata: Include titles and categories for better retrieval
Format: PDF and text formats work best

Vector Search Optimization

Context Size: Start with 3-5 chunks, adjust based on results
Table Design: Create separate tables for different document types
Embeddings: Re-run ragify_column when source text changes
Indexing: Consider adding indexes on metadata columns

Performance Tips

MySQL AI: Best for moderate data sizes (< 1M rows)
HeatWave: Use for large datasets requiring high-performance search
Batch Loading: Load multiple documents at once using prefix filters
Regular Updates: Schedule periodic document refreshes

Common Questions

Can I use both local and OCI storage with the same connection?

No. MySQL AI connections use local filesystem loading, while HeatWave connections use OCI Object Storage. The server automatically detects your connection type and provides appropriate tools.

How do I update documents in the vector store?

For the default vector store, re-run the load command to refresh documents. For custom InnoDB tables, update the source text column and re-run ragify_column to regenerate embeddings.

What happens if I query an empty vector store?

The RAG tools will return an error or empty result indicating no documents are available. Load documents first using the appropriate loading tool for your connection type.

Can I have multiple vector stores?

Yes. The default vector store is managed automatically, but you can create multiple custom InnoDB tables with VECTOR columns using ragify_column for different document collections.

How do I improve RAG answer quality?

Increase context_size to retrieve more segments
Improve source document quality and structure
Use more specific questions
Create separate vector stores for different topics
Regularly update embeddings when source content changes

Overview

MySQL

Oracle Database

Overview

Loading Documents

From Local Filesystem (MySQL AI)

1. List Available Files

2. Load Documents

From OCI Object Storage (MySQL HeatWave)

1. List Buckets and Objects

2. Load Documents with Prefix Filter

Creating Vector Embeddings

Using ragify_column

Performing RAG Queries

Query Default Vector Store

Query Custom InnoDB Tables

Retrieve Segments Without Generation

Vector Store Architecture

MySQL AI

MySQL HeatWave

Complete Workflow Example

MySQL AI Workflow

MySQL HeatWave Workflow

Best Practices

Document Preparation

Vector Search Optimization

Performance Tips

Common Questions

Next Steps

ML & GenAI

API Tools

Build docs developers (and LLMs) love

Overview

MySQL

Oracle Database

​Overview

​Loading Documents

​From Local Filesystem (MySQL AI)

​1. List Available Files

​2. Load Documents

​From OCI Object Storage (MySQL HeatWave)

​1. List Buckets and Objects

​2. Load Documents with Prefix Filter

​Creating Vector Embeddings

​Using ragify_column

​Performing RAG Queries

​Query Default Vector Store

​Query Custom InnoDB Tables

​Retrieve Segments Without Generation

​Vector Store Architecture

​MySQL AI

​MySQL HeatWave

​Complete Workflow Example

​MySQL AI Workflow

​MySQL HeatWave Workflow

​Best Practices

​Document Preparation

​Vector Search Optimization

​Performance Tips

​Common Questions

​Next Steps

ML & GenAI

API Tools

Build docs developers (and LLMs) love

Overview

Loading Documents

From Local Filesystem (MySQL AI)

1. List Available Files

2. Load Documents

From OCI Object Storage (MySQL HeatWave)

1. List Buckets and Objects

2. Load Documents with Prefix Filter

Creating Vector Embeddings

Using ragify_column

Performing RAG Queries

Query Default Vector Store

Query Custom InnoDB Tables

Retrieve Segments Without Generation

Vector Store Architecture

MySQL AI

MySQL HeatWave

Complete Workflow Example

MySQL AI Workflow

MySQL HeatWave Workflow

Best Practices

Document Preparation

Vector Search Optimization

Performance Tips

Common Questions

Next Steps