Understanding model types
Syft Space currently supports OpenAI and OpenAI-compatible model providers.OpenAI
Connect to OpenAI’s chat completion API or any OpenAI-compatible endpoint. Icon: 🤖 Key features:- Full support for OpenAI’s chat completion API
- Compatible with OpenAI alternatives (Anthropic via proxy, local models)
- Custom base URL for self-hosted models
- Configurable system prompts
api_key- Your OpenAI API key (required)model- Model identifier (e.g.,gpt-4,gpt-3.5-turbo)base_url- Custom base URL for OpenAI-compatible APIs (optional)system_prompt- Default system prompt for completions (optional)
- GPT-4 and GPT-4 Turbo
- GPT-3.5 Turbo
- Any OpenAI-compatible model (Ollama, vLLM, LM Studio)
Creating a model
sk-)gpt-3.5-turbo)- Default:
https://api.openai.com/v1 - Example for local Ollama:
http://localhost:11434/v1 - Example for vLLM:
http://localhost:8000/v1
Model configuration examples
OpenAI GPT-4
GPT-3.5 Turbo (cost-effective)
Local Ollama model
Self-hosted vLLM
Using OpenAI-compatible services
The OpenAI model type works with any service that implements the OpenAI chat completion API.Ollama (local models)
Run open-source models locally:- Install Ollama from ollama.ai
- Pull a model:
ollama pull llama3:8b - Configure model in Syft Space:
- Base URL:
http://localhost:11434/v1 - Model:
llama3:8b - API Key: Use any string (not validated)
- Base URL:
vLLM (high-performance inference)
Deploy models with optimized inference:- Start vLLM server with your model
- Configure model in Syft Space:
- Base URL: Your vLLM server URL
- Model: Full model identifier (e.g.,
mistralai/Mistral-7B-Instruct-v0.2) - API Key: Your authentication token if required
Other OpenAI-compatible providers
Any service implementing the OpenAI API format:- Together AI
- Anyscale Endpoints
- LM Studio
- Text Generation WebUI
Understanding model parameters
When querying an endpoint, you can override default model parameters:Temperature
Controls randomness in responses (0.0 - 2.0):- 0.0 - Deterministic, always picks most likely token
- 0.7 - Balanced creativity and consistency (default)
- 1.5+ - More creative and varied responses
Max tokens
Maximum number of tokens to generate:- Default: 100
- Higher values allow longer responses but increase cost and latency
Stop sequences
Text patterns that stop generation:- Default:
["\n"] - Example:
["\n\n", "END", "---"]
Presence penalty
Reduces repetition of topics (-2.0 to 2.0):- Positive values encourage discussing new topics
- Negative values allow repeating topics
Frequency penalty
Reduces repetition of exact phrases (-2.0 to 2.0):- Positive values discourage repeating words
- Negative values allow more repetition
Checking model health
Before using a model in an endpoint, verify it’s working:Managing API keys
Rotating API keys
To update an API key:- Generate a new API key from your provider
- Navigate to the model detail page
- Click Edit Configuration
- Update the
api_keyfield - Save changes
- Test the connection to verify
Updating a model’s configuration affects all endpoints using that model. Test thoroughly after making changes.
Security best practices
- Never commit API keys to version control
- Use separate keys for development and production
- Rotate keys regularly (every 90 days recommended)
- Monitor usage to detect unauthorized access
- Set rate limits at the provider level when possible
Model costs and optimization
Cost management
Syft Space tracks token usage for each query:- Prompt tokens - Input text including context from datasets
- Completion tokens - Generated response text
- Total tokens - Sum of prompt and completion tokens
Optimization tips
-
Choose appropriate models
- Use GPT-3.5 for simple queries
- Reserve GPT-4 for complex reasoning
-
Limit context size
- Reduce
limitparameter in searches to fewer documents - Use higher
similarity_thresholdto filter less relevant results
- Reduce
-
Set max tokens appropriately
- Don’t request more tokens than needed
- Typical values: 100-500 for most use cases
-
Cache responses when possible
- Use consistent queries to benefit from provider caching
- Consider implementing your own caching layer
-
Use local models for development
- Test with Ollama locally before deploying
- Switch to paid APIs for production
Updating models
You can update certain model properties after creation:You cannot change the model type or core configuration (like API key or base URL) through the UI. To change these, edit the configuration directly or create a new model.
Deleting models
Deleting a model removes the configuration from Syft Space:Before deleting, verify no endpoints are using this model. The model detail page shows all connected endpoints.
Troubleshooting
Authentication errors
Symptom: “Invalid API key” or 401 errors Solutions:- Verify the API key is correct and hasn’t expired
- Check the base URL matches your provider
- Ensure the API key has appropriate permissions
Model not found
Symptom: “Model not found” or 404 errors Solutions:- Verify the model identifier is correct (e.g.,
gpt-4, notGPT-4) - Check you have access to the specified model
- For local models, ensure the model is pulled and running
Connection timeouts
Symptom: Requests timeout or hang Solutions:- Check network connectivity to the API endpoint
- Verify firewall rules allow outbound connections
- For local models, ensure the service is running
- Increase timeout values if using slow models
Rate limiting
Symptom: “Rate limit exceeded” errors Solutions:- Implement rate limiting policies on your endpoints
- Upgrade your provider plan for higher limits
- Use caching to reduce duplicate requests
- Consider using multiple API keys with load balancing
Next steps
Build endpoints
Combine models and datasets into queryable endpoints
Set policies
Control access and rate limits for your endpoints