Ollama Plugin
Thegenkitx-ollama plugin enables you to run AI models locally using Ollama. This is ideal for development, testing, and applications that need to run models on-premise without external API calls.
Installation
Prerequisites
- Install Ollama: Download and install from ollama.ai
- Pull models: Download the models you want to use
- Start Ollama server: The server runs automatically after installation, or start it manually:
http://localhost:11434
Basic Setup
Configuration
Plugin Options
Model Configuration
Popular Models
Chat Models
Code Models
Embedding Models
Usage Examples
Text Generation
Multi-turn Conversation
Tool Calling
Image Input (Multimodal)
Embeddings
Using in Flows
Direct Model Usage
Advanced Configuration
Custom Server Address
Custom Request Headers
Model-specific Settings
Model Management
List Available Models
Pull New Models
Remove Models
Show Model Info
Best Practices
Choose Appropriate Model Size
- 7B models - Fast, good for most tasks, 8GB RAM
- 13B models - Better quality, 16GB RAM recommended
- 70B+ models - Highest quality, requires 32GB+ RAM
Optimize Performance
Handle Errors
Pre-load Models
Pre-load models to reduce first-request latency:Limitations
- Tool calling: Only available on
chatAPI, notgenerate - Input schema: Tools must have object input schemas
- Performance: Depends on local hardware
- Model size: Larger models require more RAM and are slower
Troubleshooting
Server Not Running
Error:ECONNREFUSED
Solution: Start the Ollama server:
Model Not Found
Error: Model not available Solution: Pull the model first:Out of Memory
Solution: Use a smaller model or increase system RAMSlow Performance
Solutions:- Use smaller models (7B instead of 13B)
- Reduce
maxOutputTokens - Use GPU acceleration if available
- Close other applications