K2 Think V2 Setup

K2 Think V2 (MBZUAI-IFM/K2-Think-v2) is a specialized AI model focused on advanced reasoning capabilities. It uses local OCR for image processing to minimize costs while maintaining strong analytical abilities.

Features

Advanced Reasoning: Optimized for complex problem-solving
Cost-Effective Vision: Uses free local OCR (Tesseract.js) instead of expensive vision APIs
High-Level Analysis: Excellent for analytical and technical tasks
ChatGPT-Compatible API: Standard OpenAI-compatible endpoint

Setup

Get API Key

Obtain your K2 Think API key from the K2 Think service provider.

K2 Think uses a specialized API endpoint separate from OpenAI or other providers.

Configure Environment

Add K2 Think settings to your .env file:

K2_THINK_API_KEY=your_k2_think_api_key_here
USE_K2_THINK=true

Set USE_K2_THINK=true to enable K2 Think mode. Otherwise, it won’t activate even with the API key.

Verify Configuration

Start Cluely and check the console:

[LLMHelper] Using K2 Think V2 with model: MBZUAI-IFM/K2-Think-v2

Configuration

Environment Variables

Variable	Required	Description	Default
`K2_THINK_API_KEY`	Yes	Your K2 Think API key	-
`USE_K2_THINK`	Yes	Enable K2 Think mode	`false`

Default Model

Cluely uses the K2-Think-v2 model:

// K2 Think model (source/electron/LLMHelper.ts:35)
private k2ThinkModel: string = "MBZUAI-IFM/K2-Think-v2"

The model name is fixed in the current implementation. Custom model selection will be added in future versions.

API Implementation

Endpoint

K2 Think uses this API endpoint:

POST https://api.k2think.ai/v1/chat/completions

Request Configuration

// K2 Think API call (source/electron/LLMHelper.ts:282-298)
const response = await fetch("https://api.k2think.ai/v1/chat/completions", {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${this.k2ThinkApiKey}`,
    'accept': 'application/json',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: this.k2ThinkModel,
    messages: [
      {
        role: "user",
        content: prompt
      }
    ],
    stream: false
  }),
})

Headers

Header	Value	Purpose
`Authorization`	`Bearer {API_KEY}`	Authentication
`accept`	`application/json`	Response format
`Content-Type`	`application/json`	Request format

Local OCR for Images

K2 Think uses free local OCR (Tesseract.js) to extract text from images before analysis. This eliminates vision API costs.

How It Works

Screenshot Capture

User takes a screenshot using Cluely’s hotkey (Cmd/Ctrl + H).

Local OCR Processing

Tesseract.js extracts text from the image locally:

// OCR extraction (source/electron/LLMHelper.ts:361)
const ocrResults = await Promise.all(
  imagePaths.map(path => Tesseract.recognize(path, 'eng'))
)
const extractedText = ocrResults.map(r => r.data.text).join("\n---\n")

Text Analysis

Extracted text is sent to K2 Think for analysis:

const prompt = `CONTEXT FROM SCREENSHOTS (EXTRACTED VIA LOCAL OCR):
"""
${extractedText}
"""

Analyze the extracted content and provide solutions...`

Benefits of Local OCR

Zero Vision Costs

No expensive vision API calls - OCR runs entirely on your machine

Privacy

Images processed locally before text is sent to API

Fast Processing

Tesseract.js is optimized for speed

Offline Capable

OCR works without internet (API call still requires connection)

OCR Accuracy

OCR works best with:

Clear, high-resolution screenshots
Text-based content (code, documents, error messages)
Standard fonts and layouts

May struggle with:

Handwritten text
Complex visual layouts
Low-resolution or blurry images

Use Cases

K2 Think V2 excels at:

Complex Problem Solving

Multi-step reasoning tasks
Technical analysis
Algorithm design
System architecture planning

Code Analysis

Debug error messages (via OCR)
Code review and optimization
Architecture discussions
Technical documentation

Research & Analysis

Academic problem solving
Mathematical reasoning
Logical deduction
Comparative analysis

Limitations

Image Analysis

K2 Think uses OCR, not native vision:

Good for: Text-heavy screenshots, code, error messages, documents
Not ideal for: Charts, diagrams, photos, visual designs

For native vision capabilities, use Gemini instead.

Audio Processing

Audio features are not supported. Cluely uses Gemini for all voice functionality, even when K2 Think is enabled.

Streaming

Current implementation uses non-streaming responses:

stream: false  // K2 Think API configuration

Responses arrive all at once after processing completes.

Switching to K2 Think

At Startup

Set environment variables:

K2_THINK_API_KEY=your_api_key_here
USE_K2_THINK=true

At Runtime

Switch from another provider:

// Switch to K2 Think V2
await llmHelper.switchToK2Think(
  'your_api_key_here',
  'MBZUAI-IFM/K2-Think-v2'
)

// Or use existing key from environment
await llmHelper.switchToK2Think()

// Verify current provider
const provider = llmHelper.getCurrentProvider()
console.log(provider)  // "k2think"

Priority Order

K2 Think has highest priority when enabled:

// Provider selection logic (source/electron/LLMHelper.ts:666-669)
if (this.useK2Think) return "k2think";
if (this.useOpenRouter) return "openrouter";
return this.useOllama ? "ollama" : "gemini";

Testing Connection

Verify K2 Think is configured correctly:

const result = await llmHelper.testConnection()

if (result.success) {
  console.log('K2 Think connected successfully')
} else {
  console.error('Connection failed:', result.error)
}

Test sends a simple “Hello” prompt to verify API connectivity.

Troubleshooting

K2 Think Not Activating

Issue: Still using Gemini/other provider despite configurationSolutions:

Verify USE_K2_THINK=true is set in .env
Check K2_THINK_API_KEY is not empty
Restart Cluely after updating .env
Check console for: [LLMHelper] Using K2 Think V2...

API Key Errors

Error: K2 Think API key is not configured or K2 Think API key is requiredSolutions:

Verify API key is set in .env
Check for extra spaces or quotes
Ensure key is valid and active
Test with: llmHelper.testConnection()

OCR Not Working

Issue: Images not being analyzed properlySolutions:

Ensure screenshots are high-resolution
Check console for: [LLMHelper] Starting Local OCR...
Verify Tesseract.js is installed: Check package.json dependencies
Try with clear, text-heavy screenshots first

API Errors

Error: K2 Think API error: 401/403/500Solutions:

401 Unauthorized: Check API key is correct
403 Forbidden: Verify account has access to K2-Think-v2 model
500 Server Error: K2 Think service may be down, try again later
Rate Limiting: Wait before retrying

OCR Best Practices

Optimize Screenshots

Use high resolution (1920x1080 or higher)
Ensure good contrast between text and background
Avoid heavy compression or artifacts

Content Type

Best for:

Code snippets
Error messages
Text documents
Console output

Avoid:

Heavily styled text
Cursive fonts
Overlapping elements

Monitor OCR Output

Check console for OCR results:

[LLMHelper] Starting Local OCR for 2 image(s)...
[LLMHelper] Local OCR complete. Total extracted text length: 1247

If extracted text length is unexpectedly low, screenshot quality may be poor.

Advanced Configuration

Custom System Prompts

K2 Think uses Cluely’s standard system prompt (source/electron/LLMHelper.ts:12-22):

You are Wingman AI, a helpful, proactive assistant...

The prompt is automatically prepended to all requests.

Response Processing

K2 Think responses are cleaned of markdown formatting:

const text = this.cleanJsonResponse(result)
const parsed = JSON.parse(text)

This ensures consistent JSON parsing for structured outputs.

Comparison with Other Providers

Feature	K2 Think V2	Gemini	Ollama	OpenRouter
Reasoning	Advanced	Excellent	Good	Varies
Vision	OCR-based	Native	None	Model-dependent
Privacy	Cloud + Local OCR	Cloud	100% Local	Cloud
Cost	API costs	API costs	Free	API costs
Speed	Moderate	Very Fast	Fast	Fast

Get Started

Core Features

AI Providers

Guides

​Features

​Setup

​Configuration

​Environment Variables

​Default Model

​API Implementation

​Endpoint

​Request Configuration

​Headers

​Local OCR for Images

​How It Works

​Benefits of Local OCR

Zero Vision Costs

Privacy

Fast Processing

Offline Capable

​OCR Accuracy

​Use Cases

​Limitations

​Image Analysis

​Audio Processing

​Streaming

​Switching to K2 Think

​At Startup

​At Runtime

​Priority Order

​Testing Connection

​Troubleshooting

​OCR Best Practices

​Advanced Configuration

​Custom System Prompts

​Response Processing

​Comparison with Other Providers

​Next Steps

Provider Overview

Gemini Setup

Build docs developers (and LLMs) love

Features

Setup

Configuration

Environment Variables

Default Model

API Implementation

Endpoint

Request Configuration

Headers

Local OCR for Images

How It Works

Benefits of Local OCR

OCR Accuracy

Use Cases

Limitations

Image Analysis

Audio Processing

Streaming

Switching to K2 Think

At Startup

At Runtime

Priority Order

Testing Connection

Troubleshooting

OCR Best Practices

Advanced Configuration

Custom System Prompts

Response Processing

Comparison with Other Providers

Next Steps