Skip to main content

Performance Optimization

Best practices for achieving low latency and high throughput with KoreShield.
KoreShield typically adds 50-150ms overhead. With proper optimization, you can minimize this impact while maintaining strong security.

Caching Strategies

Response Caching

import { Koreshield } from 'Koreshield-sdk';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);
const Koreshield = new Koreshield({ apiKey: process.env.Koreshield_API_KEY });

async function cachedScan(content: string) {
  // Generate cache key
  const cacheKey = `scan:${Buffer.from(content).toString('base64')}`;

  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // Scan with Koreshield
  const scan = await Koreshield.scan({ content });

  // Cache result (5 minutes)
  await redis.setex(cacheKey, 300, JSON.stringify(scan));

  return scan;
}
Cache scan results for identical inputs to avoid redundant API calls. Use a 5-10 minute TTL for security decisions.

LRU Cache

import LRU from 'lru-cache';

const cache = new LRU<string, any>({
  max: 10000,
  ttl: 1000 * 60 * 5, // 5 minutes
  updateAgeOnGet: true,
});

async function lruCachedScan(content: string) {
  const key = hashContent(content);

  if (cache.has(key)) {
    return cache.get(key);
  }

  const scan = await Koreshield.scan({ content });
  cache.set(key, scan);

  return scan;
}

function hashContent(content: string): string {
  return require('crypto')
    .createHash('sha256')
    .update(content)
    .digest('hex');
}

Batch Processing

Batch Scanning

async function optimizedBatch(messages: string[]) {
  // Process in chunks
  const CHUNK_SIZE = 100;
  const results = [];

  for (let i = 0; i < messages.length; i += CHUNK_SIZE) {
    const chunk = messages.slice(i, i + CHUNK_SIZE);

    const batchResult = await Koreshield.batchScan({
      items: chunk.map((content, idx) => ({
        id: `${i + idx}`,
        content,
      })),
    });

    results.push(...batchResult.results);
  }

  return results;
}
Batch scanning is much faster than individual scans but may delay individual results. Use it for background processing, not real-time user interactions.

Parallel Processing

async function parallelScan(messages: string[]) {
  const CONCURRENCY = 10;

  const results = await Promise.all(
    chunk(messages, CONCURRENCY).map(async batch =>
      Promise.all(
        batch.map(content => Koreshield.scan({ content }))
      )
    )
  );

  return results.flat();
}

function chunk<T>(array: T[], size: number): T[][] {
  return Array.from({ length: Math.ceil(array.length / size) }, (_, i) =>
    array.slice(i * size, i * size + size)
  );
}

Connection Pooling

import { Agent } from 'https';

const agent = new Agent({
  keepAlive: true,
  maxSockets: 50,
  maxFreeSockets: 10,
  timeout: 60000,
});

const Koreshield = new Koreshield({
  apiKey: process.env.Koreshield_API_KEY,
  httpAgent: agent,
});
Connection pooling reduces latency by reusing TCP connections. Configure keepAlive: true and set appropriate socket limits.

Request Deduplication

class DedupedScanner {
  private pending = new Map<string, Promise<any>>();

  async scan(content: string) {
    const key = hashContent(content);

    if (this.pending.has(key)) {
      return this.pending.get(key);
    }

    const promise = Koreshield.scan({ content }).finally(() => {
      this.pending.delete(key);
    });

    this.pending.set(key, promise);

    return promise;
  }
}

const scanner = new DedupedScanner();

// Multiple simultaneous requests for same content only trigger one API call
const [result1, result2, result3] = await Promise.all([
  scanner.scan('hello'),
  scanner.scan('hello'),
  scanner.scan('hello'),
]);

Timeouts & Retries

import pRetry from 'p-retry';
import pTimeout from 'p-timeout';

async function resilientScan(content: string) {
  return pRetry(
    async () => {
      return pTimeout(
        Koreshield.scan({ content }),
        {
          milliseconds: 5000,
          message: 'Scan timeout',
        }
      );
    },
    {
      retries: 3,
      factor: 2,
      minTimeout: 1000,
      onFailedAttempt: error => {
        console.log(
          `Attempt ${error.attemptNumber} failed. ${error.retriesLeft} retries left.`
        );
      },
    }
  );
}

Lazy Loading

class LazyKoreshield {
  private instance: Koreshield | null = null;

  private getInstance() {
    if (!this.instance) {
      this.instance = new Koreshield({
        apiKey: process.env.Koreshield_API_KEY,
      });
    }
    return this.instance;
  }

  async scan(content: string) {
    return this.getInstance().scan({ content });
  }
}

// Only initialize when first scan is requested
const Koreshield = new LazyKoreshield();

Compression

import { gzip, ungzip } from 'node:zlib';
import { promisify } from 'node:util';

const gzipAsync = promisify(gzip);
const ungzipAsync = promisify(ungzip);

async function compressedScan(content: string) {
  // Compress content
  const compressed = await gzipAsync(Buffer.from(content));

  // Send compressed
  const scan = await Koreshield.scan({
    content: compressed.toString('base64'),
    encoding: 'gzip-base64',
  });

  return scan;
}
Compression helps with large prompts (10KB+) but adds CPU overhead. Use it for documents, not short messages.

Edge Computing

Cloudflare Workers

export default {
  async fetch(request: Request, env: Env) {
    const Koreshield = new Koreshield({ apiKey: env.Koreshield_API_KEY });

    const { message } = await request.json();

    const scan = await Koreshield.scan({ content: message });

    return Response.json({ safe: !scan.threat_detected });
  },
};

Vercel Edge Functions

export const config = { runtime: 'edge' };

const Koreshield = new Koreshield({
  apiKey: process.env.Koreshield_API_KEY,
});

export default async function handler(req: Request) {
  const { message } = await req.json();

  const scan = await Koreshield.scan({ content: message });

  return Response.json({ safe: !scan.threat_detected });
}

Database Optimization

Indexed Scans

import { Prisma, PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function trackScans(userId: string, content: string, scan: any) {
  await prisma.scan.create({
    data: {
      userId,
      content,
      threatDetected: scan.threat_detected,
      threatType: scan.threat_type,
      confidence: scan.confidence,
      timestamp: new Date(),
    },
  });
}

// Create indexes
// @@index([userId, timestamp])
// @@index([threatDetected])

Aggregate Queries

async function getUserStats(userId: string) {
  const stats = await prisma.scan.aggregate({
    where: { userId },
    _count: { id: true },
    _sum: { confidence: true },
  });

  return {
    totalScans: stats._count.id,
    avgConfidence: stats._sum.confidence / stats._count.id,
  };
}

Load Balancing

class LoadBalancedScanner {
  private endpoints = [
    'https://api-us.Koreshield.com',
    'https://api-eu.Koreshield.com',
    'https://api-asia.Koreshield.com',
  ];

  private currentIndex = 0;

  private getNextEndpoint() {
    const endpoint = this.endpoints[this.currentIndex];
    this.currentIndex = (this.currentIndex + 1) % this.endpoints.length;
    return endpoint;
  }

  async scan(content: string) {
    const endpoint = this.getNextEndpoint();

    const Koreshield = new Koreshield({
      apiKey: process.env.Koreshield_API_KEY,
      baseURL: endpoint,
    });

    return Koreshield.scan({ content });
  }
}

Monitoring Performance

import { Histogram } from 'prom-client';

const scanDuration = new Histogram({
  name: 'Koreshield_scan_duration_ms',
  help: 'Koreshield scan duration in milliseconds',
  buckets: [10, 50, 100, 200, 500, 1000, 2000],
});

async function monitoredScan(content: string) {
  const start = Date.now();

  try {
    const scan = await Koreshield.scan({ content });
    const duration = Date.now() - start;

    scanDuration.observe(duration);

    return scan;
  } catch (error) {
    const duration = Date.now() - start;
    scanDuration.observe(duration);
    throw error;
  }
}

Benchmarking

async function benchmark(iterations: number = 1000) {
  const testContent = 'Hello, world!';
  const results: number[] = [];

  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    await Koreshield.scan({ content: testContent });
    const duration = performance.now() - start;
    results.push(duration);
  }

  const sorted = results.sort((a, b) => a - b);

  console.log('Benchmark Results:');
  console.log(`  Iterations: ${iterations}`);
  console.log(`  P50: ${sorted[Math.floor(iterations * 0.5)].toFixed(2)}ms`);
  console.log(`  P95: ${sorted[Math.floor(iterations * 0.95)].toFixed(2)}ms`);
  console.log(`  P99: ${sorted[Math.floor(iterations * 0.99)].toFixed(2)}ms`);
  console.log(`  Max: ${sorted[iterations - 1].toFixed(2)}ms`);
}

Best Practices

Use SDK Built-in Caching

const Koreshield = new Koreshield({
  apiKey: process.env.Koreshield_API_KEY,
  cache: {
    enabled: true,
    ttl: 300,
    maxSize: 10000,
  },
});

Fail Fast

const scan = await Promise.race([
  Koreshield.scan({ content }),
  new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), 2000)
  ),
]);

Optimize Payloads

// Bad: Sending entire document
const scan = await Koreshield.scan({ content: longDocument });

// Good: Send only user input
const scan = await Koreshield.scan({ content: userInput });
Only scan user-controlled inputs, not entire documents or system prompts. This reduces latency and API costs.

Performance Metrics

KoreShield adds 50-150ms on average:
  • P50: ~50ms
  • P95: ~150ms
  • P99: ~300ms
With caching and optimization, you can reduce this to sub-50ms for repeated content.
Throughput depends on your plan and infrastructure:
  • Free tier: ~10 requests/second
  • Pro tier: ~100 requests/second
  • Enterprise: 1000+ requests/second with dedicated infrastructure
Use batch scanning and connection pooling to maximize throughput.
Yes, but with caution:
  • Cache identical inputs for 5-10 minutes
  • Use content hashes as keys (never raw prompts)
  • Invalidate cache when security policies change
  • Don’t cache user-specific or sensitive data
  • Implement rate limiting at your application layer
  • Use Redis for distributed caching
  • Enable request deduplication
  • Consider edge deployment for global traffic
  • Use batch scanning for background processing

Build docs developers (and LLMs) love