Performance Optimization

Best practices for achieving low latency and high throughput with KoreShield.

KoreShield typically adds 50-150ms overhead. With proper optimization, you can minimize this impact while maintaining strong security.

Caching Strategies

Response Caching

import { Koreshield } from 'Koreshield-sdk';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);
const Koreshield = new Koreshield({ apiKey: process.env.Koreshield_API_KEY });

async function cachedScan(content: string) {
  // Generate cache key
  const cacheKey = `scan:${Buffer.from(content).toString('base64')}`;

  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // Scan with Koreshield
  const scan = await Koreshield.scan({ content });

  // Cache result (5 minutes)
  await redis.setex(cacheKey, 300, JSON.stringify(scan));

  return scan;
}

Cache scan results for identical inputs to avoid redundant API calls. Use a 5-10 minute TTL for security decisions.

LRU Cache

import LRU from 'lru-cache';

const cache = new LRU<string, any>({
  max: 10000,
  ttl: 1000 * 60 * 5, // 5 minutes
  updateAgeOnGet: true,
});

async function lruCachedScan(content: string) {
  const key = hashContent(content);

  if (cache.has(key)) {
    return cache.get(key);
  }

  const scan = await Koreshield.scan({ content });
  cache.set(key, scan);

  return scan;
}

function hashContent(content: string): string {
  return require('crypto')
    .createHash('sha256')
    .update(content)
    .digest('hex');
}

Batch Processing

Batch Scanning

async function optimizedBatch(messages: string[]) {
  // Process in chunks
  const CHUNK_SIZE = 100;
  const results = [];

  for (let i = 0; i < messages.length; i += CHUNK_SIZE) {
    const chunk = messages.slice(i, i + CHUNK_SIZE);

    const batchResult = await Koreshield.batchScan({
      items: chunk.map((content, idx) => ({
        id: `${i + idx}`,
        content,
      })),
    });

    results.push(...batchResult.results);
  }

  return results;
}

Batch scanning is much faster than individual scans but may delay individual results. Use it for background processing, not real-time user interactions.

Parallel Processing

async function parallelScan(messages: string[]) {
  const CONCURRENCY = 10;

  const results = await Promise.all(
    chunk(messages, CONCURRENCY).map(async batch =>
      Promise.all(
        batch.map(content => Koreshield.scan({ content }))
      )
    )
  );

  return results.flat();
}

function chunk<T>(array: T[], size: number): T[][] {
  return Array.from({ length: Math.ceil(array.length / size) }, (_, i) =>
    array.slice(i * size, i * size + size)
  );
}

Connection Pooling

import { Agent } from 'https';

const agent = new Agent({
  keepAlive: true,
  maxSockets: 50,
  maxFreeSockets: 10,
  timeout: 60000,
});

const Koreshield = new Koreshield({
  apiKey: process.env.Koreshield_API_KEY,
  httpAgent: agent,
});

Connection pooling reduces latency by reusing TCP connections. Configure keepAlive: true and set appropriate socket limits.

Request Deduplication

class DedupedScanner {
  private pending = new Map<string, Promise<any>>();

  async scan(content: string) {
    const key = hashContent(content);

    if (this.pending.has(key)) {
      return this.pending.get(key);
    }

    const promise = Koreshield.scan({ content }).finally(() => {
      this.pending.delete(key);
    });

    this.pending.set(key, promise);

    return promise;
  }
}

const scanner = new DedupedScanner();

// Multiple simultaneous requests for same content only trigger one API call
const [result1, result2, result3] = await Promise.all([
  scanner.scan('hello'),
  scanner.scan('hello'),
  scanner.scan('hello'),
]);

Timeouts & Retries

import pRetry from 'p-retry';
import pTimeout from 'p-timeout';

async function resilientScan(content: string) {
  return pRetry(
    async () => {
      return pTimeout(
        Koreshield.scan({ content }),
        {
          milliseconds: 5000,
          message: 'Scan timeout',
        }
      );
    },
    {
      retries: 3,
      factor: 2,
      minTimeout: 1000,
      onFailedAttempt: error => {
        console.log(
          `Attempt ${error.attemptNumber} failed. ${error.retriesLeft} retries left.`
        );
      },
    }
  );
}

Lazy Loading

class LazyKoreshield {
  private instance: Koreshield | null = null;

  private getInstance() {
    if (!this.instance) {
      this.instance = new Koreshield({
        apiKey: process.env.Koreshield_API_KEY,
      });
    }
    return this.instance;
  }

  async scan(content: string) {
    return this.getInstance().scan({ content });
  }
}

// Only initialize when first scan is requested
const Koreshield = new LazyKoreshield();

Compression

import { gzip, ungzip } from 'node:zlib';
import { promisify } from 'node:util';

const gzipAsync = promisify(gzip);
const ungzipAsync = promisify(ungzip);

async function compressedScan(content: string) {
  // Compress content
  const compressed = await gzipAsync(Buffer.from(content));

  // Send compressed
  const scan = await Koreshield.scan({
    content: compressed.toString('base64'),
    encoding: 'gzip-base64',
  });

  return scan;
}

Compression helps with large prompts (10KB+) but adds CPU overhead. Use it for documents, not short messages.

Edge Computing

Cloudflare Workers

export default {
  async fetch(request: Request, env: Env) {
    const Koreshield = new Koreshield({ apiKey: env.Koreshield_API_KEY });

    const { message } = await request.json();

    const scan = await Koreshield.scan({ content: message });

    return Response.json({ safe: !scan.threat_detected });
  },
};

Vercel Edge Functions

export const config = { runtime: 'edge' };

const Koreshield = new Koreshield({
  apiKey: process.env.Koreshield_API_KEY,
});

export default async function handler(req: Request) {
  const { message } = await req.json();

  const scan = await Koreshield.scan({ content: message });

  return Response.json({ safe: !scan.threat_detected });
}

Database Optimization

Indexed Scans

import { Prisma, PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function trackScans(userId: string, content: string, scan: any) {
  await prisma.scan.create({
    data: {
      userId,
      content,
      threatDetected: scan.threat_detected,
      threatType: scan.threat_type,
      confidence: scan.confidence,
      timestamp: new Date(),
    },
  });
}

// Create indexes
// @@index([userId, timestamp])
// @@index([threatDetected])

Aggregate Queries

async function getUserStats(userId: string) {
  const stats = await prisma.scan.aggregate({
    where: { userId },
    _count: { id: true },
    _sum: { confidence: true },
  });

  return {
    totalScans: stats._count.id,
    avgConfidence: stats._sum.confidence / stats._count.id,
  };
}

Load Balancing

class LoadBalancedScanner {
  private endpoints = [
    'https://api-us.Koreshield.com',
    'https://api-eu.Koreshield.com',
    'https://api-asia.Koreshield.com',
  ];

  private currentIndex = 0;

  private getNextEndpoint() {
    const endpoint = this.endpoints[this.currentIndex];
    this.currentIndex = (this.currentIndex + 1) % this.endpoints.length;
    return endpoint;
  }

  async scan(content: string) {
    const endpoint = this.getNextEndpoint();

    const Koreshield = new Koreshield({
      apiKey: process.env.Koreshield_API_KEY,
      baseURL: endpoint,
    });

    return Koreshield.scan({ content });
  }
}

Monitoring Performance

import { Histogram } from 'prom-client';

const scanDuration = new Histogram({
  name: 'Koreshield_scan_duration_ms',
  help: 'Koreshield scan duration in milliseconds',
  buckets: [10, 50, 100, 200, 500, 1000, 2000],
});

async function monitoredScan(content: string) {
  const start = Date.now();

  try {
    const scan = await Koreshield.scan({ content });
    const duration = Date.now() - start;

    scanDuration.observe(duration);

    return scan;
  } catch (error) {
    const duration = Date.now() - start;
    scanDuration.observe(duration);
    throw error;
  }
}

Benchmarking

async function benchmark(iterations: number = 1000) {
  const testContent = 'Hello, world!';
  const results: number[] = [];

  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    await Koreshield.scan({ content: testContent });
    const duration = performance.now() - start;
    results.push(duration);
  }

  const sorted = results.sort((a, b) => a - b);

  console.log('Benchmark Results:');
  console.log(`  Iterations: ${iterations}`);
  console.log(`  P50: ${sorted[Math.floor(iterations * 0.5)].toFixed(2)}ms`);
  console.log(`  P95: ${sorted[Math.floor(iterations * 0.95)].toFixed(2)}ms`);
  console.log(`  P99: ${sorted[Math.floor(iterations * 0.99)].toFixed(2)}ms`);
  console.log(`  Max: ${sorted[iterations - 1].toFixed(2)}ms`);
}

Best Practices

Use SDK Built-in Caching

const Koreshield = new Koreshield({
  apiKey: process.env.Koreshield_API_KEY,
  cache: {
    enabled: true,
    ttl: 300,
    maxSize: 10000,
  },
});

Fail Fast

const scan = await Promise.race([
  Koreshield.scan({ content }),
  new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), 2000)
  ),
]);

Optimize Payloads

// Bad: Sending entire document
const scan = await Koreshield.scan({ content: longDocument });

// Good: Send only user input
const scan = await Koreshield.scan({ content: userInput });

Only scan user-controlled inputs, not entire documents or system prompts. This reduces latency and API costs.

Performance Metrics

What's the typical latency overhead?

KoreShield adds 50-150ms on average:

P50: ~50ms
P95: ~150ms
P99: ~300ms

With caching and optimization, you can reduce this to sub-50ms for repeated content.

How many requests can I process per second?

Throughput depends on your plan and infrastructure:

Free tier: ~10 requests/second
Pro tier: ~100 requests/second
Enterprise: 1000+ requests/second with dedicated infrastructure

Use batch scanning and connection pooling to maximize throughput.

Should I cache scan results?

Yes, but with caution:

Cache identical inputs for 5-10 minutes
Use content hashes as keys (never raw prompts)
Invalidate cache when security policies change
Don’t cache user-specific or sensitive data

What's the best way to handle high traffic spikes?

Implement rate limiting at your application layer
Use Redis for distributed caching
Enable request deduplication
Consider edge deployment for global traffic
Use batch scanning for background processing

Get Started

Features

Integrations

Configuration

Advanced

Best Practices

Compliance

Performance Optimization

Performance Optimization

Caching Strategies

Response Caching

LRU Cache

Batch Processing

Batch Scanning

Parallel Processing

Connection Pooling

Request Deduplication

Timeouts & Retries

Lazy Loading

Compression

Edge Computing

Cloudflare Workers

Vercel Edge Functions

Database Optimization

Indexed Scans

Aggregate Queries

Load Balancing

Monitoring Performance

Benchmarking

Best Practices

Use SDK Built-in Caching

Fail Fast

Optimize Payloads

Performance Metrics

Build docs developers (and LLMs) love

Get Started

Features

Integrations

Configuration

Advanced

Best Practices

Compliance

​Performance Optimization

​Caching Strategies

​Response Caching

​LRU Cache

​Batch Processing

​Batch Scanning

​Parallel Processing

​Connection Pooling

​Request Deduplication

​Timeouts & Retries

​Lazy Loading

​Compression

​Edge Computing

​Cloudflare Workers

​Vercel Edge Functions

​Database Optimization

​Indexed Scans

​Aggregate Queries

​Load Balancing

​Monitoring Performance

​Benchmarking

​Best Practices

​Use SDK Built-in Caching

​Fail Fast

​Optimize Payloads

​Performance Metrics

​Related Documentation

Build docs developers (and LLMs) love

Performance Optimization

Caching Strategies

Response Caching

LRU Cache

Batch Processing

Batch Scanning

Parallel Processing

Connection Pooling

Request Deduplication

Timeouts & Retries

Lazy Loading

Compression

Edge Computing

Cloudflare Workers

Vercel Edge Functions

Database Optimization

Indexed Scans

Aggregate Queries

Load Balancing

Monitoring Performance

Benchmarking

Best Practices

Use SDK Built-in Caching

Fail Fast

Optimize Payloads

Performance Metrics

Related Documentation