Skip to main content
MySQL MCP Server provides tools for loading documents into vector stores and performing semantic search with Retrieval-Augmented Generation (RAG).

Overview

Vector stores enable semantic search by storing text embeddings as VECTOR columns. The server supports:
  • Document Loading: Ingest documents from local filesystem or OCI Object Storage
  • Vector Embeddings: Automatically generate embeddings using ragify_column
  • RAG Queries: Natural language queries that retrieve relevant context and generate answers
  • Dual Storage: Default vector store or custom InnoDB tables

Loading Documents

From Local Filesystem (MySQL AI)

For MySQL AI connections, load documents from the secure_file_priv directory.

1. List Available Files

"List local files for vector store ingestion"
Tool: list_vector_store_files_local

2. Load Documents

"Load documents from /var/lib/mysql-files/product_manuals.pdf"
Tool: load_vector_store_local Parameters:
  • connection_id: Database connection
  • file_path: Path within secure_file_priv directory
Supported Formats:
  • PDF documents
  • Text files (.txt, .md)
  • HTML files
The load_vector_store_local tool only works with MySQL AI connections. It will fail with a clear error message on HeatWave connections.

From OCI Object Storage (MySQL HeatWave)

For MySQL HeatWave connections, load documents from OCI Object Storage buckets.

1. List Buckets and Objects

"List all buckets in the production compartment"
"List objects in my 'documentation' bucket"
Tools: object_storage_list_buckets, object_storage_list_objects

2. Load Documents with Prefix Filter

"Load all documents with prefix 'manuals/' from bucket 'company-docs' namespace 'mytenancy' into vector store"
Tool: load_vector_store_oci Parameters:
  • connection_id: HeatWave database connection
  • namespace: OCI Object Storage namespace
  • bucket_name: Source bucket name
  • document_prefix: Prefix to filter objects (e.g., “manuals/”, “docs/2024/”)
Example:
{
  "connection_id": "prod_heatwave",
  "namespace": "mycompany",
  "bucket_name": "knowledge-base",
  "document_prefix": "support/faq/"
}
The load_vector_store_oci tool only works with MySQL HeatWave connections. For MySQL AI, use load_vector_store_local.
Requires a valid OCI configuration file (~/.oci/config). See Configuration for details.

Creating Vector Embeddings

Using ragify_column

Convert text columns into vector embeddings for semantic search.
"Add embeddings for 'description' column into 'description_vec' column in products table"
Tool: ragify_column Parameters:
  • connection_id: Database connection
  • table_name: Target table
  • input_column_name: Source text column
  • embedding_column_name: Target VECTOR column (created if doesn’t exist)
What It Does:
  1. Creates the embedding column if it doesn’t exist (as VECTOR type)
  2. Generates embeddings for all rows in the text column
  3. Populates the VECTOR column with embeddings
  4. Returns the number of rows processed
Example Workflow:
-- Create a table with text content
CREATE TABLE product_docs (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(255),
  body TEXT
);

-- Insert documents
INSERT INTO product_docs (title, body) VALUES
  ('Installation Guide', 'Follow these steps to install...'),
  ('Troubleshooting', 'If you encounter errors...');
Then use ragify_column:
"Add embeddings for 'body' column into 'body_embedding' column in product_docs table"
Result:
-- Table now has:
CREATE TABLE product_docs (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(255),
  body TEXT,
  body_embedding VECTOR  -- Added automatically
);
The ragify_column tool works on both MySQL AI and HeatWave connections with identical behavior.

Performing RAG Queries

Query Default Vector Store

Use the default vector store created by document loading tools.
"Ask ml_rag: What is the refund policy?"
"Show me information about installation requirements from the vector store"
Tool: ask_ml_rag_vector_store Parameters:
  • connection_id: Database connection
  • question: Natural language question
  • context_size (optional): Number of context chunks to retrieve
How It Works:
  1. Converts your question into a vector embedding
  2. Performs similarity search in the vector store
  3. Retrieves the top N most relevant text segments
  4. Generates an answer using retrieved context and a language model
Example:
User: "Ask ml_rag: What payment methods are supported?"

Server:
1. Searches vector store for relevant segments
2. Finds: "We accept credit cards, PayPal, and bank transfers..."
3. Generates: "According to the documentation, supported payment methods include credit cards, PayPal, and bank transfers."

Query Custom InnoDB Tables

Perform RAG queries on specific InnoDB tables with custom columns.
"Search my product_docs table using 'body' and 'body_embedding' columns for warranty information"
Tool: ask_ml_rag_innodb Parameters:
  • connection_id: Database connection
  • question: Natural language question
  • segment_col: Column containing text segments
  • embedding_col: Column containing VECTOR embeddings
  • context_size (optional): Number of context chunks
When to Use:
  • You have multiple tables with embeddings
  • You want to restrict search to a specific table
  • You need fine-grained control over which embeddings to search
Performance Considerations:
  • MySQL AI: Vector search executes within the database service instance
  • HeatWave: Table is loaded into the HeatWave cluster for distributed processing (much faster for large datasets)

Retrieve Segments Without Generation

Get raw text segments without LLM-generated answers.
"Retrieve relevant segments about pricing from vector store without generating an answer"
Tool: ask_ml_rag_vector_store (with skip_generate parameter) Use Cases:
  • Debugging vector search results
  • Building custom generation pipelines
  • Analyzing retrieved context quality

Vector Store Architecture

MySQL AI

┌─────────────────┐
│  MySQL AI       │
│  Database       │
│                 │
│  ┌───────────┐  │
│  │  InnoDB   │  │  Vector operations
│  │  Tables   │  │  within database
│  │           │  │  service instance
│  │  VECTOR   │  │
│  │  columns  │  │
│  └───────────┘  │
└─────────────────┘

       │ Load from
       │ local filesystem

┌──────────────────┐
│ secure_file_priv │
│   directory      │
└──────────────────┘

MySQL HeatWave

┌─────────────────────────────┐
│  MySQL HeatWave Cluster     │
│                             │
│  ┌───────────────────────┐  │
│  │  Distributed          │  │
│  │  Vector Processing    │  │
│  │                       │  │
│  │  In-Memory Analytics  │  │
│  └───────────────────────┘  │
└─────────────────────────────┘

       │ Load from
       │ Object Storage

┌──────────────────┐
│  OCI Object      │
│  Storage         │
│                  │
│  Buckets/Objects │
└──────────────────┘

Complete Workflow Example

MySQL AI Workflow

# Step 1: Check available files
"List local files for vector store ingestion"

# Step 2: Load documents
"Load documents from /var/lib/mysql-files/support_articles.pdf"

# Step 3: Create custom embeddings (optional)
"Add embeddings for 'content' column into 'content_vec' column in articles table"

# Step 4: Query the vector store
"Ask ml_rag: How do I reset my password?"

# Step 5: Query custom table
"Search articles table using 'content' and 'content_vec' for password reset instructions"

MySQL HeatWave Workflow

# Step 1: List OCI resources
"List all compartments in my tenancy"
"Show all buckets in the production compartment"
"List objects in the 'knowledge-base' bucket"

# Step 2: Load from Object Storage
"Load all documents with prefix 'kb/2024/' from bucket 'knowledge-base' namespace 'mycompany'"

# Step 3: Query the vector store
"Ask ml_rag: What are the new features in version 2.0?"

# Step 4: Create custom embeddings for existing tables
"Add embeddings for 'faq_answer' column into 'faq_vec' column in faqs table"

# Step 5: Query custom table
"Search faqs table using 'faq_answer' and 'faq_vec' for billing questions"

Best Practices

Document Preparation

  • Structure: Use clear headings and sections
  • Chunk Size: Keep paragraphs focused (200-500 words)
  • Metadata: Include titles and categories for better retrieval
  • Format: PDF and text formats work best

Vector Search Optimization

  • Context Size: Start with 3-5 chunks, adjust based on results
  • Table Design: Create separate tables for different document types
  • Embeddings: Re-run ragify_column when source text changes
  • Indexing: Consider adding indexes on metadata columns

Performance Tips

  • MySQL AI: Best for moderate data sizes (< 1M rows)
  • HeatWave: Use for large datasets requiring high-performance search
  • Batch Loading: Load multiple documents at once using prefix filters
  • Regular Updates: Schedule periodic document refreshes

Common Questions

No. MySQL AI connections use local filesystem loading, while HeatWave connections use OCI Object Storage. The server automatically detects your connection type and provides appropriate tools.
For the default vector store, re-run the load command to refresh documents. For custom InnoDB tables, update the source text column and re-run ragify_column to regenerate embeddings.
The RAG tools will return an error or empty result indicating no documents are available. Load documents first using the appropriate loading tool for your connection type.
Yes. The default vector store is managed automatically, but you can create multiple custom InnoDB tables with VECTOR columns using ragify_column for different document collections.
  1. Increase context_size to retrieve more segments
  2. Improve source document quality and structure
  3. Use more specific questions
  4. Create separate vector stores for different topics
  5. Regularly update embeddings when source content changes

Next Steps

ML & GenAI

Explore text generation and NL2SQL features

API Tools

View complete tool reference

Build docs developers (and LLMs) love