Models represent configured instances of AI language models that generate text responses. Each model is backed by a model type that defines how to communicate with the underlying AI service.
Model entity
A model is defined by the following properties:
class Model :
id : UUID # Unique identifier
tenant_id: UUID # Tenant isolation
name: str # Unique name per tenant
dtype: str # Model type (e.g., "openai", "vllm_local")
configuration: dict # Type-specific config (API keys, URLs)
summary: str # Brief description
tags: str # Comma-separated tags
created_at: datetime
updated_at: datetime
Location: backend/syft_space/components/models/entities.py:15
Model types
Model types implement the BaseModelType protocol and provide:
Configuration schema
Each type defines required fields:
@ classmethod
def configuration_schema ( cls ) -> dict[ str , Any]:
"""Return configuration schema for this model type."""
return {
"api_key" : { "type" : "string" , "required" : True , "secret" : True },
"model" : { "type" : "string" , "required" : True },
"base_url" : { "type" : "string" , "required" : False }
}
Chat interface
All model types implement chat functionality:
async def chat (
self ,
ctx : ChatContext,
messages : list[ChatMessage],
params : ChatParameters | None = None
) -> ChatResult:
"""Generate a response from the model."""
Location: backend/syft_space/components/model_types/interfaces.py:129
Chat data models
ChatMessage
Input messages to the model:
class ChatMessage :
role: str # "user", "assistant", or "system"
content: str # Message text
Location: backend/syft_space/components/model_types/interfaces.py:18
ChatParameters
Control generation behavior:
class ChatParameters :
temperature: float = 0.7 # Randomness (0.0-2.0)
max_tokens: int = 100 # Maximum response length
stop_sequences: list[ str ] = [] # Stop generation at these strings
presence_penalty: float = 0.0 # Penalize repeated topics (-2.0 to 2.0)
frequency_penalty: float = 0.0 # Penalize repeated tokens (-2.0 to 2.0)
top_p: float = 1.0 # Nucleus sampling (0.0-1.0)
extra_options: dict = {} # Type-specific options
Location: backend/syft_space/components/model_types/interfaces.py:26
ChatResult
Model response:
class ChatResult :
id : str # Unique completion ID
model: str # Model name used
messages: list[ChatMessageResult] # Generated messages
finish_reason: str # "stop", "length", "error", etc.
usage: TokenUsage # Token consumption details
metadata: dict # Additional info
class ChatMessageResult :
role: str # Message role
content: str # Generated text
tokens: int # Tokens in this message
class TokenUsage :
prompt_tokens: int # Tokens in input
completion_tokens: int # Tokens in output
total_tokens: int # Sum of both
Location: backend/syft_space/components/model_types/interfaces.py:68
Available model types
OpenAI
Type name : openai
Connect to OpenAI’s API (GPT-4, GPT-3.5, etc.).
Configuration :
{
"api_key" : "sk-..." ,
"model" : "gpt-4" ,
"base_url" : "https://api.openai.com/v1" // optional
}
Use cases :
Production-grade chat completions
Function calling
Advanced reasoning tasks
vLLM (local)
Type name : vllm_local
Connect to locally-hosted vLLM inference server.
Configuration :
{
"base_url" : "http://localhost:8000" ,
"model" : "meta-llama/Llama-2-7b-chat-hf"
}
Use cases :
Privacy-preserving inference (data never leaves your infrastructure)
Custom fine-tuned models
Cost optimization for high-volume use
Model operations
Create model
async def create_model (
request : CreateModelRequest,
tenant : Tenant
) -> ModelResponse:
"""
1. Validates model type exists
2. Validates configuration against schema
3. Creates model entity
"""
Location: backend/syft_space/components/models/handlers.py:86
Request schema :
class CreateModelRequest :
name: str # Unique name per tenant
dtype: str # Model type name
configuration: dict # Type-specific config
summary: str = "" # Optional description
tags: str = "" # Comma-separated tags
Update model
Partial updates (only name, summary, tags):
async def update_model (
name : str ,
request : UpdateModelRequest,
tenant : Tenant
) -> ModelResponse:
"""
Updates metadata fields.
Configuration cannot be updated (delete + recreate instead).
"""
Model configuration (API keys, URLs) cannot be updated. To change configuration, delete and recreate the model.
Location: backend/syft_space/components/models/handlers.py:162
Delete model
async def delete_model ( name : str , tenant : Tenant) -> dict :
"""Deletes model and cascades to connected endpoints."""
Location: backend/syft_space/components/models/handlers.py:197
Healthcheck
Verify model connectivity:
async def healthcheck ( name : str , tenant : Tenant) -> HealthcheckResponse:
"""
Returns:
- status: HEALTHY or UNHEALTHY
- message: Details about connection state
"""
Location: backend/syft_space/components/models/handlers.py:217
RAG integration
When a model is used in an endpoint with a dataset (response type “both”), search results are automatically injected as context:
# Endpoint handler combines dataset + model
if references and references.documents:
# Build context from top 3 search results
context_content = " \n\n " .join([
f "[ { doc.document_id } ] { doc.content } "
for doc in references.documents[: 3 ]
])
# Inject as system message
context_message = ChatMessage(
role = "system" ,
content = f "Use the following context to answer: \n { context_content } "
)
messages.insert( 0 , context_message)
# Chat with model
chat_result = await model_instance.chat(ctx, messages, params)
Location: backend/syft_space/components/endpoints/handlers.py:481
This implements the retrieval-augmented generation pattern:
Query searches dataset for relevant documents
Top results are formatted as context
Context + user message sent to model
Model generates answer grounded in retrieved documents
When querying an endpoint, model responses follow this structure:
class SummaryResponse :
id : str # Completion ID
model: str # Model name used
message: MessageResponse # Generated message
finish_reason: str # Completion reason
usage: TokenUsage # Token consumption
cost: float # Generation cost
provider_info: ProviderInfo # API version, response time
class MessageResponse :
role: str # "assistant"
content: str # Generated text
tokens: int # Token count
Location: backend/syft_space/components/endpoints/schemas.py:336
Relationships
Tenant : Each model belongs to one tenant
Endpoints : One model can be used by multiple endpoints
Context injection
The ChatContext object tracks model usage:
class ChatContext ( Context ):
sender: str # Email of user making request (from auth token)
model_id: UUID # Model being used
This enables:
Audit logging (who used which model when)
Usage tracking per sender
Policy enforcement based on sender identity
Location: backend/syft_space/components/model_types/interfaces.py:11
Example workflow
Create OpenAI model
POST /api/v1/models with OpenAI credentials {
"name" : "gpt-4-assistant" ,
"dtype" : "openai" ,
"configuration" : {
"api_key" : "sk-..." ,
"model" : "gpt-4"
}
}
Test healthcheck
GET /api/v1/models/gpt-4-assistant/healthcheck Verifies API key and connectivity
Create endpoint
Link model to an endpoint (with or without dataset) {
"slug" : "qa-bot" ,
"model_id" : "<model-uuid>" ,
"response_type" : "summary"
}
Query endpoint
POST /api/v1/endpoints/qa-bot/query {
"messages" : [{ "role" : "user" , "content" : "What is RAG?" }],
"temperature" : 0.7 ,
"max_tokens" : 150
}
Returns generated response
Best practices
Name models by their purpose: customer-support-gpt4, legal-qa-llama2
Store API keys in configuration, not hardcoded. They are encrypted in the database.
Always run healthcheck after creating a model to verify connectivity before using in endpoints.
Track usage.total_tokens in responses to understand costs and optimize prompts.
Next steps
Endpoints Combine models with datasets to create RAG endpoints
Policies Apply rate limiting and access controls to model usage