Skip to main content
Cluely includes a built-in chat interface that lets you interact with AI models for code assistance, problem-solving, and general questions without needing to capture screenshots.

How it works

The chat feature provides a direct text interface to Cluely’s language models, supporting multiple AI providers:
  • Google Gemini (default) - Fast, capable vision and language model
  • Ollama - Local, privacy-focused models
  • OpenRouter - Access to various cloud models
  • K2 Think - Specialized reasoning models

Starting a chat

Access the chat interface through the Solutions view after processing screenshots or voice input. The chat maintains context from your current session:
const response = await window.electronAPI.geminiChat(message)

Chat with context

The AI assistant uses the same system prompt across all features:
private readonly systemPrompt = `You are Wingman AI, a helpful, proactive assistant for any kind of problem or situation. 

CRITICAL: You MUST use Markdown for all responses.
1. Use headers (#, ##), lists (* or 1.), and bold text to organize information clearly.
2. Use LaTeX for ALL mathematical formulas and equations.
3. Use code blocks with language specification for any code snippets.
4. For any user input, analyze the situation, provide a clear problem statement, 
   relevant context, and suggest several possible responses or actions.
5. Always explain your reasoning.`
All responses are formatted in Markdown with support for LaTeX equations, code blocks, and structured formatting.

Response formatting

Code blocks

Code in responses is automatically syntax-highlighted:
```javascript
function fibonacci(n) {
  if (n <= 1) return n
  return fibonacci(n - 1) + fibonacci(n - 2)
}
```

Mathematical equations

Math is rendered using LaTeX:
  • Inline: $E=mc^2$E=mc2E=mc^2
  • Block: $$x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}$$

Structured information

Responses use headers, lists, and bold text for clarity:
## Problem Analysis

**Issue**: Array index out of bounds

**Possible causes**:
* Loop condition is incorrect
* Array size not validated
* Off-by-one error in iteration

AI provider management

Checking current provider

See which AI provider is currently active:
const config = await window.electronAPI.getCurrentLLMConfig()
// Returns:
// {
//   provider: "gemini" | "ollama" | "openrouter" | "k2think",
//   model: "models/gemini-2.5-flash",
//   isOllama: false,
//   isOpenRouter: false
// }

Switching providers

await window.electronAPI.switchToGemini(
  apiKey,  // Optional if already configured
  "models/gemini-2.5-flash"  // Optional model override
)

Testing connection

Verify your AI provider is working:
const result = await window.electronAPI.testLLMConnection()
// Returns: { success: true } or { success: false, error: "..." }
If you switch providers during a conversation, the chat history won’t carry over. Start fresh conversations after switching.

Local models with Ollama

Setup

  1. Install Ollama from ollama.ai
  2. Pull a model: ollama pull llama3.2
  3. Set environment variables:
USE_OLLAMA=true
OLLAMA_MODEL=llama3.2
OLLAMA_URL=http://localhost:11434

Auto-detection

Cluely automatically detects available Ollama models:
const models = await window.electronAPI.getAvailableOllamaModels()
// Returns: ["llama3.2", "codellama", "mistral", ...]
If your specified model doesn’t exist, Cluely uses the first available model.

Benefits

Privacy

All processing happens locally on your machine

No API costs

Use unlimited tokens without spending money

Offline support

Works without internet connection

Model choice

Choose from dozens of open-source models

Error handling

Rate limiting

Cluely automatically handles rate limits with exponential backoff:
private async generateContentWithRetry(contents: any): Promise<any> {
  const maxRetries = 3
  let delay = 1000
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await this.model.generateContent(contents)
    } catch (error) {
      if (error.message.includes('429') || error.message.includes('RATE_LIMIT')) {
        // Switch to fallback API key if available
        if (this.fallbackGeminiApiKey) {
          this.geminiApiKey = this.fallbackGeminiApiKey
          this.model = new GoogleGenAI({ apiKey: this.fallbackGeminiApiKey })
          continue
        }
      }
      if (error.message.includes('503') && attempt < maxRetries - 1) {
        await new Promise(resolve => setTimeout(resolve, delay))
        delay *= 2  // Exponential backoff
      }
    }
  }
}
Configure a fallback API key with GEMINI_FALLBACK_API_KEY to automatically switch when rate limits are hit.

Model overload

When models are overloaded (503 errors), Cluely retries with exponential backoff up to 3 times.

Best practices

Be specific about what you need:
  • ❌ “Fix this code”
  • ✅ “Why does this function return undefined instead of the sum?”
Share relevant details:
  • Language/framework you’re using
  • What you’ve already tried
  • Error messages you’re seeing
For multi-step problems:
  • Ask one question at a time
  • Build on previous responses
  • Verify understanding before moving forward
  • Gemini: Best for vision + chat, fast responses
  • Ollama: Best for privacy, offline work
  • OpenRouter: Best for accessing specific models
  • K2 Think: Best for complex reasoning tasks

Environment configuration

Set these in your .env file:
# Gemini (default)
GEMINI_API_KEY=your_key_here
GEMINI_FALLBACK_API_KEY=backup_key  # Optional

# Ollama (local)
USE_OLLAMA=true
OLLAMA_MODEL=llama3.2
OLLAMA_URL=http://localhost:11434

# OpenRouter
OPENROUTER_API_KEY=your_key
OPENROUTER_MODEL=google/gemini-2.5-flash

# K2 Think
USE_K2_THINK=true
K2_THINK_API_KEY=your_key

Build docs developers (and LLMs) love