Testing

Testing AI applications requires different strategies than traditional software testing. Genkit provides tools and patterns for testing flows, evaluating model outputs, and ensuring your AI features work reliably.

Testing Approaches

Flow Testing

Flows are the core testable units in Genkit applications. You can test flows using:

Interactive Testing - Developer UI
Command-Line Testing - CLI commands
Automated Testing - Unit and integration tests
Batch Testing - Testing with datasets

Interactive Testing with Developer UI

The Developer UI provides the fastest way to test flows during development:

genkit start -- npm run dev

Benefits:

Immediate visual feedback
Trace inspection for debugging
Easy input modification
Streaming output support

Example Workflow:

Open the Developer UI (typically http://localhost:4000)
Navigate to the Flows section
Select your flow (e.g., simpleGreeting)
Enter test input:
```
{"customerName": "Sam"}
```
Click “Run” and inspect the output
Review the trace for detailed execution steps

Command-Line Testing

Running Individual Flows

Test flows from the command line with specific inputs:

genkit flow:run simpleGreeting '{"customerName":"Sam"}'

With Output Streaming:

genkit flow:run menuQuestion '{"question":"What drinks do you have?"}' --stream

Saving Results:

genkit flow:run simpleGreeting '{"customerName":"Sam"}' --output result.json

Batch Testing

Test flows with multiple inputs using batch runs: Create a test dataset (test-inputs.json):

[
  {"customerName": "Alice"},
  {"customerName": "Bob"},
  {"customerName": "Charlie"}
]

Run the batch test:

genkit flow:batchRun simpleGreeting test-inputs.json --output results.json

Label batch runs for tracking:

genkit flow:batchRun simpleGreeting test-inputs.json --label "regression-test-v1"

This creates labeled traces that can be filtered in the Developer UI and extracted later for evaluation.

Creating Testable Flows

Design flows with testing in mind:

import { genkit, z } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({
  plugins: [googleAI()],
});

// Define clear input and output schemas
const CustomerNameSchema = z.object({
  customerName: z.string(),
});

// Create a testable flow
export const simpleGreetingFlow = ai.defineFlow(
  {
    name: 'simpleGreeting',
    inputSchema: CustomerNameSchema,
    outputSchema: z.string(),
  },
  async (input) => {
    const prompt = ai.definePrompt(
      {
        name: 'greetingPrompt',
        model: googleAI.model('gemini-flash-latest'),
        input: { schema: CustomerNameSchema },
      },
      `You're a barista at a coffee shop.
       A customer named {{customerName}} enters.
       Greet them in one sentence.`
    );
    
    const result = await prompt(input);
    return result.text;
  }
);

Testing this flow:

genkit flow:run simpleGreeting '{"customerName":"Sam"}'

Self-Testing Flows

Create flows that test other flows:

export const testAllCoffeeFlows = ai.defineFlow(
  {
    name: 'testAllCoffeeFlows',
    outputSchema: z.object({
      pass: z.boolean(),
      error: z.string().optional(),
    }),
  },
  async () => {
    try {
      // Test flow 1
      const test1 = await simpleGreetingFlow({ 
        customerName: 'Sam' 
      });
      
      // Test flow 2 with different inputs
      const test2 = await greetingWithHistoryFlow({
        customerName: 'Sam',
        currentTime: '09:45am',
        previousOrder: 'Caramel Macchiato',
      });
      
      // Verify results
      if (!test1 || !test2) {
        return { pass: false, error: 'Empty response' };
      }
      
      return { pass: true };
    } catch (error) {
      return { 
        pass: false, 
        error: error.message 
      };
    }
  }
);

Run the test flow:

genkit flow:run testAllCoffeeFlows

View the trace in the Developer UI to see the results of all nested flow executions.

Evaluation-Based Testing

Evaluation goes beyond simple pass/fail testing by measuring quality metrics.

Running Evaluations

Evaluate a flow with a dataset:

genkit eval:flow simpleGreeting --input test-dataset.json --evaluators answer-relevance,faithfulness

Evaluate a standalone dataset:

genkit eval:run evaluation-dataset.json --evaluators answer-quality

Creating Test Datasets

Test datasets should include input, expected output, and context:

[
  {
    "testCaseId": "greeting-1",
    "input": {"customerName": "Alice"},
    "reference": "A friendly greeting mentioning Alice by name",
    "context": ["Coffee shop setting", "Morning time"]
  },
  {
    "testCaseId": "greeting-2",
    "input": {"customerName": "Bob"},
    "reference": "A friendly greeting mentioning Bob by name",
    "context": ["Coffee shop setting", "Afternoon time"]
  }
]

Extracting Test Data from Traces

Generate test datasets from production traces:

genkit eval:extractData simpleGreeting --output extracted-dataset.json --maxRows 50

This extracts:

Actual inputs used in production
Outputs generated
Context information
Trace IDs for reference

Extract data from labeled runs:

genkit eval:extractData simpleGreeting --label "production-v1" --maxRows 100

Integration Testing

Test flows in integration with external services:

// Test with real model API
export const integrationTestFlow = ai.defineFlow(
  {
    name: 'integrationTest',
    outputSchema: z.object({ success: z.boolean() }),
  },
  async () => {
    const result = await ai.generate({
      model: googleAI.model('gemini-flash-latest'),
      prompt: 'Say hello',
    });
    
    return { 
      success: result.text.length > 0 
    };
  }
);

Mock Testing

While Genkit doesn’t provide built-in mocking, you can implement mocks for testing:

// Create a mock model for testing
const mockModel = ai.defineModel(
  {
    name: 'mock-model',
  },
  async (input) => {
    // Return deterministic responses for testing
    return {
      message: { role: 'model', content: [{ text: 'Mock response' }] },
      finishReason: 'stop',
    };
  }
);

// Use in test flows
const testFlow = ai.defineFlow(
  { name: 'testWithMock' },
  async () => {
    const result = await ai.generate({
      model: mockModel,
      prompt: 'Test prompt',
    });
    return result.text;
  }
);

Unit Testing with Jest/Vitest

Write traditional unit tests for your flows:

import { describe, test, expect } from '@jest/globals';
import { simpleGreetingFlow } from './index';

describe('simpleGreetingFlow', () => {
  test('should greet customer by name', async () => {
    const result = await simpleGreetingFlow({ 
      customerName: 'Alice' 
    });
    
    expect(result).toBeTruthy();
    expect(result.toLowerCase()).toContain('alice');
  });
  
  test('should handle empty customer name', async () => {
    await expect(
      simpleGreetingFlow({ customerName: '' })
    ).rejects.toThrow();
  });
});

Example from Genkit source (cloud-sql-pg/test/index.test.ts):

describe('configurePostgresRetriever Integration Tests', () => {
  test('should retrieve relevant documents based on a query', async () => {
    const retriever = configurePostgresRetriever({
      embedder: mockEmbedder,
      engine: testEngine,
      tableName: TEST_TABLE,
    });
    
    const results = await retriever.retrieve({
      query: 'test query',
      k: 5,
    });
    
    expect(results).toBeDefined();
    expect(results.length).toBeGreaterThan(0);
  });
  
  test('should handle empty query text gracefully', async () => {
    const retriever = configurePostgresRetriever({
      embedder: mockEmbedder,
      engine: testEngine,
    });
    
    const results = await retriever.retrieve({
      query: '',
      k: 5,
    });
    
    expect(results).toEqual([]);
  });
});

Best Practices

1. Use Clear Schemas

Define explicit input and output schemas for all flows:

const inputSchema = z.object({
  question: z.string(),
  context: z.array(z.string()).optional(),
});

const outputSchema = z.object({
  answer: z.string(),
  confidence: z.number(),
});

2. Test Edge Cases

Empty inputs
Very long inputs
Special characters
Invalid data types
Missing required fields

3. Label Test Runs

Use labels to organize test traces:

genkit flow:batchRun myFlow inputs.json --label "regression-v2.1"

4. Maintain Test Datasets

Keep versioned test datasets in your repository:

tests/
  datasets/
    greeting-v1.json
    greeting-v2.json
    menu-questions.json

5. Automate Evaluation

Incorporate evaluation into CI/CD:

#!/bin/bash
# test.sh
genkit start -- npm run dev &
PID=$!
sleep 5
genkit eval:flow myFlow --input tests/datasets/test-v1.json --force
kill $PID

6. Review Traces

Always inspect traces for failed tests to understand why they failed:

Run the test via CLI or UI
Open the Developer UI
Navigate to Traces
Find the failed trace
Inspect each step

7. Test with Real Data

Extract real usage patterns:

genkit eval:extractData myFlow --maxRows 100 --output real-data.json

Use this data to create realistic test cases.

Continuous Testing

Integrate testing into your development workflow:

During Development: Use Developer UI for immediate feedback
Before Commits: Run batch tests locally
In CI/CD: Run automated evaluations
After Deployment: Extract production data for new test cases

Next Steps

Learn about Debugging techniques for troubleshooting
Explore the Developer UI for interactive testing
Review CLI commands for test automation

Overview

Getting Started

Core Concepts

Guides

Model Providers

Deployment

Developer Tools

Testing Approaches

Flow Testing

Interactive Testing with Developer UI

Command-Line Testing

Running Individual Flows

Batch Testing

Creating Testable Flows

Self-Testing Flows

Evaluation-Based Testing

Running Evaluations

Creating Test Datasets

Extracting Test Data from Traces

Integration Testing

Mock Testing

Unit Testing with Jest/Vitest

Best Practices

1. Use Clear Schemas

2. Test Edge Cases

3. Label Test Runs

4. Maintain Test Datasets

5. Automate Evaluation

6. Review Traces

7. Test with Real Data

Continuous Testing

Next Steps

Build docs developers (and LLMs) love

Overview

Getting Started

Core Concepts

Guides

Model Providers

Deployment

Developer Tools

​Testing Approaches

​Flow Testing

​Interactive Testing with Developer UI

​Command-Line Testing

​Running Individual Flows

​Batch Testing

​Creating Testable Flows

​Self-Testing Flows

​Evaluation-Based Testing

​Running Evaluations

​Creating Test Datasets

​Extracting Test Data from Traces

​Integration Testing

​Mock Testing

​Unit Testing with Jest/Vitest

​Best Practices

​1. Use Clear Schemas

​2. Test Edge Cases

​3. Label Test Runs

​4. Maintain Test Datasets

​5. Automate Evaluation

​6. Review Traces

​7. Test with Real Data

​Continuous Testing

​Next Steps

Build docs developers (and LLMs) love

Testing Approaches

Flow Testing

Interactive Testing with Developer UI

Command-Line Testing

Running Individual Flows

Batch Testing

Creating Testable Flows

Self-Testing Flows

Evaluation-Based Testing

Running Evaluations

Creating Test Datasets

Extracting Test Data from Traces

Integration Testing

Mock Testing

Unit Testing with Jest/Vitest

Best Practices

1. Use Clear Schemas

2. Test Edge Cases

3. Label Test Runs

4. Maintain Test Datasets

5. Automate Evaluation

6. Review Traces

7. Test with Real Data

Continuous Testing

Next Steps