Ollama Models
MoneyPrinter uses local Ollama models for AI-powered script generation and metadata creation.
Why Ollama?
Ollama provides local LLM inference with several advantages:
Privacy All AI processing happens locally. No data sent to cloud APIs.
Cost Zero per-token charges. Run unlimited generations.
Speed Low-latency inference with local models.
Flexibility Choose from dozens of open-source models.
Installing Ollama
Download and Install
curl -fsSL https://ollama.com/install.sh | sh
Verify installation:
Start Ollama Service
Ollama runs on http://localhost:11434 by default.
On macOS/Windows, Ollama runs as a background service after installation. No need to manually run ollama serve.
Pulling Models
Basic Usage
Pull a model:
List installed models:
Expected output:
NAME ID SIZE MODIFIED
llama3.1:8b 42182419e950 4.7 GB 2 hours ago
mistral:7b f974a74358d6 4.1 GB 1 day ago
Remove a model:
Recommended Models
For Script Generation
Model Size RAM Required Speed Quality Best For llama3.1:8b 4.7GB 8GB Fast Excellent General purpose, balanced mistral:7b 4.1GB 8GB Very Fast Good Quick iterations llama3.1:70b 40GB 64GB Slow Outstanding Best quality (GPU required) qwen2.5:7b 4.7GB 8GB Fast Good Multilingual content phi3:mini 2.3GB 4GB Very Fast Fair Low-resource environments
Selecting a Model
Consider these factors:
Available RAM : Model size + 2GB overhead
GPU : CUDA/Metal acceleration for larger models
Script quality : Larger models produce better scripts
Generation speed : Smaller models are faster
Start with llama3.1:8b for the best balance of quality and performance.
Configuring MoneyPrinter
Default Model
Set the fallback model in .env:
OLLAMA_MODEL = "llama3.1:8b"
This model is used when the frontend doesn’t specify one:
OLLAMA_MODEL = os.getenv( "OLLAMA_MODEL" , "llama3.1:8b" )
def generate_response ( prompt : str , ai_model : str ) -> str :
model_name = (ai_model or "" ).strip() or OLLAMA_MODEL
# ...
Remote Ollama Server
Run Ollama on a different machine:
OLLAMA_BASE_URL = "http://192.168.1.100:11434"
Or use a GPU server:
OLLAMA_BASE_URL = "http://gpu-server:11434"
Docker Configuration
When running MoneyPrinter in Docker with Ollama on the host:
OLLAMA_BASE_URL = "http://host.docker.internal:11434"
This works on macOS, Windows, and Linux (via extra_hosts in docker-compose.yml).
Model Selection in UI
The frontend fetches available models from the API:
async function loadOllamaModels () {
const response = await apiRequest ( '/api/models' );
if ( response . status === 'success' ) {
const modelSelect = document . getElementById ( 'aiModel' );
modelSelect . innerHTML = '' ;
response . models . forEach ( model => {
const option = document . createElement ( 'option' );
option . value = model ;
option . textContent = model ;
if ( model === response . default ) {
option . selected = true ;
}
modelSelect . appendChild ( option );
});
}
}
The API endpoint:
@app.route ( "/api/models" , methods = [ "GET" ])
def models ():
try :
available_models, default_model = list_ollama_models()
return jsonify(
{
"status" : "success" ,
"models" : available_models,
"default" : default_model,
}
)
except Exception as err:
log( f "[-] Error fetching Ollama models: { str (err) } " , "error" )
return jsonify(
{
"status" : "error" ,
"message" : "Could not fetch Ollama models. Is Ollama running?" ,
"models" : [os.getenv( "OLLAMA_MODEL" , "llama3.1:8b" )],
"default" : os.getenv( "OLLAMA_MODEL" , "llama3.1:8b" ),
}
)
Model Usage in Pipeline
Script Generation
def generate_script (
video_subject : str ,
paragraph_number : int ,
ai_model : str ,
voice : str ,
customPrompt : str ,
) -> Optional[ str ]:
prompt = f """
Generate a script for a video about { video_subject } .
Number of paragraphs: { paragraph_number }
Language: { voice }
"""
response = generate_response(prompt, ai_model)
# Clean and return script
Search Terms
def get_search_terms (
video_subject : str , amount : int , script : str , ai_model : str
) -> List[ str ]:
prompt = f """
Generate { amount } search terms for stock videos.
Subject: { video_subject }
Return as JSON array: ["term1", "term2", ...]
"""
response = generate_response(prompt, ai_model)
return json.loads(response)
def generate_metadata (
video_subject : str , script : str , ai_model : str
) -> Tuple[ str , str , List[ str ]]:
# Generate title
title_prompt = f "Generate a catchy YouTube title for: { video_subject } "
title = generate_response(title_prompt, ai_model)
# Generate description
desc_prompt = f "Write a YouTube description for: { script } "
description = generate_response(desc_prompt, ai_model)
# Generate keywords
keywords = get_search_terms(video_subject, 6 , script, ai_model)
return title, description, keywords
Benchmarks
Approximate generation times on Apple M2 Pro:
Model Script (1 para) Search Terms Metadata Total llama3.1:8b 5s 3s 8s ~16s mistral:7b 3s 2s 5s ~10s llama3.1:70b 25s 15s 35s ~75s phi3:mini 2s 1s 3s ~6s
Timings exclude video download, TTS, and rendering. GPU acceleration significantly improves performance.
Troubleshooting
No models in frontend dropdown
Verify Ollama is running :curl http://localhost:11434/api/tags
Expected response :{
"models" : [
{ "name" : "llama3.1:8b" , "model" : "llama3.1:8b" , "size" : 4661211648 }
]
}
If connection refused :
Error message :RuntimeError: Ollama model 'llama3.1:8b' is not installed.
Available models: mistral:7b
Install it with: ollama pull llama3.1:8b
Solution :The backend checks available models and provides clear instructions: except ResponseError as fallback_err:
if fallback_err.status_code == 404 and "not found" in str (fallback_err).lower():
available_models, _ = list_ollama_models()
available = ", " .join(available_models) if available_models else "none"
raise RuntimeError (
f "Ollama model ' { model_name } ' is not installed. "
f "Available models: { available } . "
f "Install it with: ollama pull { model_name } "
) from fallback_err
Ollama connection refused (Docker)
Check OLLAMA_BASE_URL :OLLAMA_BASE_URL = "http://host.docker.internal:11434"
Test connectivity from container :docker exec -it backend curl http://host.docker.internal:11434/api/tags
Linux : Verify host.docker.internal resolves:docker exec -it backend ping host.docker.internal
Symptoms :
Ollama crashes during generation
System becomes unresponsive
“Out of memory” errors
Solutions :
Use a smaller model:
Limit concurrent generations (single worker)
Add swap space (Linux):
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Advanced Configuration
Model Parameters
Customize generation parameters (requires code changes):
response = client.chat(
model = model_name,
messages = [{ "role" : "user" , "content" : prompt}],
stream = False ,
options = {
"temperature" : 0.7 , # Creativity (0.0-1.0)
"top_p" : 0.9 , # Nucleus sampling
"top_k" : 40 , # Token selection
"num_predict" : 500 , # Max tokens
}
)
Timeout Configuration
Adjust timeout for slow models:
OLLAMA_TIMEOUT = 300 # 5 minutes
Used by the client:
OLLAMA_TIMEOUT = float (os.getenv( "OLLAMA_TIMEOUT" , "180" ))
def _ollama_client () -> Client:
return Client( host = OLLAMA_BASE_URL , timeout = OLLAMA_TIMEOUT )
Next Steps
Generating Videos Create your first video
Pipeline Understand the generation process
Configuration All environment variables
Troubleshooting Common issues and solutions