Skip to main content

Overview

Query one or multiple knowledge bases using RAG (Retrieval-Augmented Generation). The system performs semantic vector search to find relevant document chunks, then uses an LLM to generate a contextual answer based on the retrieved information.
Both endpoints require knowledge bases to have vectorStatus: COMPLETED. Ensure vectorization is finished before querying.

RAG Query Workflow

  1. Semantic Search - Your question is embedded and compared against stored document vectors
  2. Relevance Ranking - Most relevant chunks are retrieved based on cosine similarity
  3. Context Assembly - Retrieved chunks are formatted as context for the LLM
  4. Answer Generation - LLM generates a natural language answer grounded in the retrieved content
  5. Response Delivery - Answer is returned either as complete JSON or streamed via SSE

Standard Query (Non-Streaming)

Endpoint: POST /api/knowledgebase/query Rate Limit: 10 requests per time window (Global + IP-based) Returns the complete answer in a single JSON response after processing is finished.

Request

knowledgeBaseIds
array
required
Array of knowledge base IDs to query. Supports querying multiple knowledge bases simultaneously for broader context.Type: integer[]Example: [1, 2, 5]Validation: At least one ID is required
question
string
required
The question to answer based on the knowledge base content.Example: "What are the system requirements for deployment?"Validation: Cannot be blank

Response

code
integer
Response status code. 200 indicates success.
message
string
Response message. "success" on successful query.
data
object
Query result containing the generated answer.

Examples

curl -X POST 'http://localhost:8080/api/knowledgebase/query' \
  -H 'Content-Type: application/json' \
  -d '{
    "knowledgeBaseIds": [1, 2],
    "question": "What are the system requirements for deployment?"
  }'

Response Example

{
  "code": 200,
  "message": "success",
  "data": {
    "answer": "Based on the deployment documentation, the system requirements are:\n\n1. **Java Runtime**: Java 21 or higher with virtual threads support\n2. **Database**: PostgreSQL 15+ with pgvector extension enabled\n3. **Memory**: Minimum 4GB RAM, recommended 8GB for production\n4. **Storage**: At least 10GB available disk space\n5. **Redis**: Version 6.2+ for async processing with Streams support\n\nThe application is containerized and can be deployed using Docker Compose for simplified setup.",
    "knowledgeBaseId": 1,
    "knowledgeBaseName": "Technical Documentation"
  }
}

Streaming Query (SSE)

Endpoint: POST /api/knowledgebase/query/stream Rate Limit: 5 requests per time window (Global + IP-based) Content-Type: text/event-stream Returns the answer incrementally as Server-Sent Events (SSE), enabling real-time streaming of the LLM response for better user experience.

Request

Request body is identical to the standard query:
knowledgeBaseIds
array
required
Array of knowledge base IDs to query.
question
string
required
The question to answer.

Response

The response is a stream of Server-Sent Events. Each event contains a chunk of the generated answer. Event Format:
data: {chunk_text}

SSE Format Notes:
  • Newlines in the answer are escaped as \n to maintain SSE protocol compatibility
  • Carriage returns are escaped as \r
  • Each event is followed by two newlines
  • The stream ends when answer generation is complete

Examples

curl -X POST 'http://localhost:8080/api/knowledgebase/query/stream' \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  -d '{
    "knowledgeBaseIds": [1, 2],
    "question": "What are the system requirements for deployment?"
  }' \
  --no-buffer

Stream Response Example

data: Based on

data:  the deployment

data:  documentation

data: , the system

data:  requirements are:

data: \n\n1

data: . **Java

data:  Runtime**:

data:  Java 21

data:  or higher

data:  with virtual

data:  threads support

data: \n2.

data:  **Database

data: **: PostgreSQL

data:  15+

data:  with pgvector

data:  extension


Error Responses

Validation Errors

{
  "code": 400,
  "message": "至少选择一个知识库"
}
{
  "code": 400,
  "message": "问题不能为空"
}

Knowledge Base Not Found

{
  "code": 404,
  "message": "知识库不存在"
}

Vectorization Not Complete

{
  "code": 400,
  "message": "知识库向量化未完成,当前状态: PROCESSING"
}

Rate Limit Exceeded

{
  "code": 429,
  "message": "请求过于频繁,请稍后再试"
}

Multi-Knowledge Base Querying

Both endpoints support querying multiple knowledge bases simultaneously:
{
  "knowledgeBaseIds": [1, 2, 5, 12],
  "question": "How do I configure the database connection?"
}
Benefits:
  • Search across multiple documents in a single query
  • Combine information from different sources
  • Improve answer quality with broader context
Considerations:
  • All specified knowledge bases must have vectorStatus: COMPLETED
  • Retrieval considers chunks from all knowledge bases
  • Response may synthesize information from multiple sources

Best Practices

When to Use Streaming

Use streaming (/query/stream) when:
  • Building real-time chat interfaces
  • Answers are expected to be long
  • User experience benefits from incremental display
  • You need to show progress during generation
Use standard query (/query) when:
  • Building APIs or batch processing
  • You need the complete answer before proceeding
  • Simpler client-side implementation is preferred
  • Logging or storing complete responses

Question Quality

For best results:
  • Be specific and clear in your questions
  • Include relevant context or keywords
  • Ask one question at a time
  • Refer to concepts likely present in the documents
Good: “What are the database migration steps for PostgreSQL?” Less Effective: “How do I set things up?”

See Also

Build docs developers (and LLMs) love