Local AI Models
Run AI models locally for privacy, cost savings, and offline operation. Includes LLM inference, image generation, and speech-to-text.Available Services
Ollama
Port: 11434 | Memory: 2048 MB | Maturity: StableRun large language models locally with an easy-to-use API. Supports Llama, Mistral, Gemma, and many more open-source models.Features:
- 100+ open-source models
- Simple REST API
- Model management CLI
- Streaming responses
- OpenAI-compatible API
- CPU and GPU support
- Llama 3.3, Llama 3.2, Llama 3.1
- Mistral, Mixtral
- Gemma 2, CodeGemma
- Phi-3, Qwen 2.5
- DeepSeek-Coder
- Skill:
ollama-local-llm - Environment:
OLLAMA_HOST,OLLAMA_PORT
ComfyUI
Port: 8188 | Memory: 4096 MB | Maturity: ExperimentalNode-based visual workflow editor for Stable Diffusion and other generative AI models. Design complex image/video generation pipelines.Features:
- Node-based workflow editor
- Stable Diffusion support
- ControlNet, LoRA, VAE support
- Custom nodes ecosystem
- REST API
- Batch processing
- NVIDIA GPU with CUDA
- nvidia-docker2 installed
- Minimum 4 GB VRAM (8 GB+ recommended)
- Skill:
comfyui-generate - Environment:
COMFYUI_HOST,COMFYUI_PORT
Stable Diffusion WebUI
Port: 7860 | Memory: 4096 MB | Maturity: ExperimentalLocal AI image generation with a web interface. Generate images from text prompts using Stable Diffusion.Features:
- Text-to-image generation
- Image-to-image transformation
- Inpainting and outpainting
- Model management
- Extensions support
- Batch processing
- NVIDIA GPU with CUDA
- nvidia-docker2 installed
- Minimum 4 GB VRAM
Faster Whisper Server
Port: 8001 | Memory: 1024 MB | Maturity: BetaSelf-hosted speech-to-text transcription service using the Faster Whisper engine for high-performance audio transcription.Features:
- OpenAI Whisper models
- Fast inference (CTranslate2)
- Multiple languages
- OpenAI-compatible API
- Timestamp support
- CPU and GPU support
- tiny, base, small, medium, large
- Multilingual and English-only variants
- Skill:
whisper-transcribe - Environment:
WHISPER_HOST,WHISPER_PORT
Usage Examples
Local LLM Stack
Image Generation Stack (GPU Required)
Complete Local AI Stack
Audio Transcription Stack
Model Management
Ollama Models
Pull models into Ollama:ComfyUI Models
Download Stable Diffusion checkpoints to thecomfyui-models volume:
Hardware Requirements
CPU-Only (LLMs)
| Model Size | RAM Required | Performance |
|---|---|---|
| 7B params | 8 GB | Good |
| 13B params | 16 GB | Moderate |
| 34B params | 32 GB | Slow |
| 70B params | 64 GB+ | Very Slow |
GPU (Image Generation)
| VRAM | Supported Models | Performance |
|---|---|---|
| 4 GB | SD 1.5 | Slow |
| 6 GB | SD 1.5, SDXL (low res) | Moderate |
| 8 GB | SDXL | Good |
| 12 GB+ | SDXL, SD3 | Excellent |
Performance Tips
Ollama Optimization
- Model Selection: Start with smaller models (7B) for faster inference
- Context Length: Reduce context window for speed
- Quantization: Use Q4 or Q5 quantized models
- GPU Acceleration: Add NVIDIA GPU support with
nvidia-docker2
Image Generation Optimization
- GPU Required: CPU inference is extremely slow (minutes per image)
- VRAM Management: Close other GPU applications
- Batch Size: Reduce for lower VRAM usage
- Resolution: Start with 512x512, then scale up
Privacy & Offline Operation
Local models provide:- Data Privacy: All processing happens on your infrastructure
- Offline Operation: No internet required after model download
- Cost Savings: No API costs for inference
- Customization: Fine-tune models for specific tasks
- Low Latency: No network round trips