Skip to main content
Screen capture is the core technology that powers Interview Copilot. It enables Tabby to “see” the coding problem on your screen and provide intelligent analysis.

How Screen Capture Works

When you trigger screen capture, Tabby:
1

Captures the Active Window

Takes a screenshot of your current screen using Electron’s native capture API
2

Encodes the Image

Converts the screenshot to a base64-encoded image for transmission
3

Sends to Vision Model

Transmits the image to an AI vision model (GPT-4 Vision, Claude 3.5 Sonnet, etc.)
4

Analyzes Content

The AI extracts the problem statement, constraints, examples, and requirements
5

Generates Structured Response

Creates comprehensive analysis across all tabs (Idea, Code, Walkthrough, etc.)

Triggering Screen Capture

Analyze New Problem

Press Alt+X to capture and analyze a coding problem:
const handleAnalyze = async () => {
  const screenshot = await window.electron.captureScreen();
  sendMessage(
    "Analyze this coding problem. Provide Idea, Code, Walkthrough, and Test Cases.",
    { screenshot }
  );
};
Make sure the coding problem is clearly visible on your screen before pressing Alt+X. The AI needs to see the problem statement, constraints, and examples.

Update with New Constraints

Press Alt+Shift+X to update the analysis when constraints change:
const handleUpdate = async () => {
  const screenshot = await window.electron.captureScreen();
  sendMessage(
    "The interviewer added new constraints. Update the analysis for all sections.",
    { screenshot }
  );
};

Get Code Suggestions

Press Alt+N to get improvement suggestions:
const handleCodeSuggestion = async () => {
  const screenshot = await window.electron.captureScreen();
  sendMessage(
    "Suggest improvements to the current code approach. Focus on optimization and clean code.",
    { screenshot }
  );
};

What the AI Sees

The vision model analyzes your screenshot for:

Problem Statement

The main problem description and what you need to solve

Constraints

Time/space complexity requirements and input limits

Examples

Sample inputs and expected outputs

Follow-ups

Additional questions or edge cases mentioned

Custom Prompts with Screenshots

You can also provide custom prompts with optional screenshot capture:
// With screenshot
handleCustomPrompt(
  "Focus on the time complexity optimization for this problem",
  includeScreenshot: true
);

// Without screenshot
handleCustomPrompt(
  "Explain the trade-offs between HashMap and TreeMap here",
  includeScreenshot: false
);
Use the prompt input at the bottom of the Interview Copilot panel to enter custom instructions. Toggle the camera icon to include/exclude screenshots.

Technical Implementation

Backend API Route

The screen capture is processed by the Interview Copilot API:
export async function POST(req: Request) {
  const { messages, conversationId, screenshot } = await req.json();
  
  // Attach screenshot to the last message
  if (screenshot) {
    const lastMsg = modelMessages[modelMessages.length - 1];
    if (lastMsg && lastMsg.role === 'user') {
      lastMsg.content = [
        { type: 'text', text: lastMsg.content },
        { type: 'image', image: screenshot }
      ];
    }
  }
  
  // Stream response with structured analysis
  return streamText({
    model: myProvider.languageModel(defaultModel),
    messages: modelMessages,
    output: Output.object({ schema: analysisSchema }),
    // ...
  });
}

Analysis Schema

The AI generates structured output matching this schema:
const analysisSchema = z.object({
  idea: z.string().describe('Problem understanding, key observations, approaches'),
  code: z.string().describe('Clean, well-commented implementation code'),
  walkthrough: z.string().describe('Step-by-step explanation of the solution'),
  testCases: z.array(z.object({
    input: z.string(),
    output: z.string(),
    reason: z.string(),
  })).describe('Edge cases and test inputs'),
  mistakes: z.array(z.object({
    mistake: z.string(),
    correction: z.string(),
    pattern: z.string(),
  })).describe('Common mistakes for this problem type'),
  memories: z.array(z.object({
    memory: z.string(),
    createdAt: z.string(),
  })).describe('Relevant memories about user preferences'),
});

Best Practices

For Clear Capture

1

Full Problem Visible

Ensure the entire problem statement fits on screen before capturing
2

Remove Distractions

Close unnecessary windows or notifications that might confuse the AI
3

Good Contrast

Use a readable theme with good contrast between text and background
4

Zoom if Needed

If text is too small, zoom in before capturing

Common Issues

Tabby captures the currently active window. Make sure the coding problem window is focused before pressing Alt+X.
Try:
  • Re-capture with better visibility
  • Use the Chat tab to clarify specific points
  • Provide a custom prompt with more context
Screen capture speed depends on:
  • Your screen resolution (lower = faster)
  • Network speed (if using cloud AI)
  • AI model choice (faster models available in settings)

Privacy & Security

Important Privacy Information
  • Screenshots are sent to your configured AI provider (OpenAI, Anthropic, etc.)
  • Images are not stored locally after processing
  • Screenshots are not saved to your conversation history by default
  • Be mindful of sensitive information visible on screen
  • Consider using local AI models for maximum privacy

Data Flow

  1. Local Capture: Screenshot taken on your machine
  2. Encoding: Converted to base64 for transmission
  3. API Call: Sent to AI provider’s vision endpoint
  4. Processing: AI analyzes and generates response
  5. Deletion: Screenshot discarded after analysis
Only the text analysis results are saved to your conversation history, not the original screenshots.

Performance Tips

Optimize Capture Speed

  • Use faster models: Select models optimized for vision in Settings
  • Reduce resolution: Lower screen resolution = smaller images = faster upload
  • Local models: Use Ollama or LM Studio for zero network latency
  • Limit content: Focus on the problem area, not full screen

API Cost Optimization

GPT-4 Vision charges per image token. Costs vary by resolution:
  • 1024x1024: ~$0.01 per analysis
  • 512x512: ~$0.003 per analysis

Next Steps

Explore Tabs

Learn about all seven analysis tabs

Memory System

How memories enhance screen analysis

Settings

Configure vision models and providers

Troubleshooting

Common issues and solutions

Build docs developers (and LLMs) love