Skip to main content
POST
/
api
/
v1
/
endpoints
/
{slug}
/
query
curl --request POST \
  --url http://localhost:8080/api/v1/endpoints/legal-qa/query \
  --header 'Authorization: Bearer SYFT_HUB_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "similarity_threshold": 0.5,
    "limit": 5,
    "max_tokens": 100,
    "temperature": 0.7
  }'
{
  "summary": {
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "model": "gpt-4",
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris.",
      "tokens": 8
    },
    "finish_reason": "stop",
    "usage": {
      "prompt_tokens": 10,
      "completion_tokens": 8,
      "total_tokens": 18
    },
    "cost": 0.0025,
    "provider_info": {
      "api_version": "v1",
      "response_time_ms": 150
    }
  },
  "references": {
    "documents": [
      {
        "document_id": "doc1",
        "content": "Paris is the capital of France.",
        "metadata": {
          "source": "wikipedia"
        },
        "similarity_score": 0.95
      }
    ],
    "provider_info": {
      "search_engine": "weaviate",
      "response_time_ms": 50
    },
    "cost": 0.001
  }
}
Query an endpoint to get responses from your RAG system. This is the core endpoint that orchestrates dataset search, model chat, and policy enforcement.
This is a public endpoint that requires a SyftHub satellite token for authentication. The user’s identity is automatically extracted from the token.

Authentication

Requires a valid SyftHub satellite token in the Authorization header. The user’s email is automatically extracted and verified from this token.

Path parameters

slug
string
required
The unique slug of the endpoint to query.

Request body

messages
string | array
required
Either a simple string query or an array of chat messages. For chat format, each message should have role (user/assistant/system) and content fields.
similarity_threshold
number
default:"0.5"
Minimum similarity score (0.0-1.0) for dataset search results. Higher values return only more relevant documents.
limit
number
default:"5"
Maximum number of documents to return from dataset search.
include_metadata
boolean
default:"true"
Whether to include document metadata in the response.
max_tokens
number
default:"100"
Maximum number of tokens to generate in the model response.
temperature
number
default:"0.7"
Sampling temperature for model generation (0.0-2.0). Higher values make output more random.
stop_sequences
array
default:"[\"\\n\"]"
Array of strings that will stop generation when encountered.
stream
boolean
default:"false"
Whether to stream the response (for real-time generation).
presence_penalty
number
default:"0.0"
Penalty for tokens based on whether they appear in the text (-2.0 to 2.0).
frequency_penalty
number
default:"0.0"
Penalty for tokens based on their frequency in the text (-2.0 to 2.0).
extras
object
default:"{}"
Additional configuration options for advanced use cases. Can include reference_options and summarize_options.
transaction_token
string
default:"null"
Optional transaction token for accounting and billing purposes.

Response

summary
object
Generated response from the model (only present if model is configured).
references
object
Reference documents from dataset search (only present if dataset is configured).
curl --request POST \
  --url http://localhost:8080/api/v1/endpoints/legal-qa/query \
  --header 'Authorization: Bearer SYFT_HUB_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "similarity_threshold": 0.5,
    "limit": 5,
    "max_tokens": 100,
    "temperature": 0.7
  }'
{
  "summary": {
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "model": "gpt-4",
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris.",
      "tokens": 8
    },
    "finish_reason": "stop",
    "usage": {
      "prompt_tokens": 10,
      "completion_tokens": 8,
      "total_tokens": 18
    },
    "cost": 0.0025,
    "provider_info": {
      "api_version": "v1",
      "response_time_ms": 150
    }
  },
  "references": {
    "documents": [
      {
        "document_id": "doc1",
        "content": "Paris is the capital of France.",
        "metadata": {
          "source": "wikipedia"
        },
        "similarity_score": 0.95
      }
    ],
    "provider_info": {
      "search_engine": "weaviate",
      "response_time_ms": 50
    },
    "cost": 0.001
  }
}
This endpoint enforces all policies attached to it. Queries may be rejected based on rate limits, access controls, or other policy rules.

Build docs developers (and LLMs) love