Skip to main content

Installation

Install the npm package:
npm
npm install @nekoimageland/retto-wasm
pnpm
pnpm add @nekoimageland/retto-wasm
yarn
yarn add @nekoimageland/retto-wasm
The package is published as @nekoimageland/retto-wasm and includes TypeScript definitions.

Loading the Module

The WASM module must be loaded asynchronously before use:
import { Retto } from '@nekoimageland/retto-wasm';

// Load the WASM module with optional progress tracking
const retto = await Retto.load((progress) => {
  console.log(`Loading: ${(progress * 100).toFixed(1)}%`);
});

Initialization

If built with embedded models:
await retto.init();
Check if embedded models are available:
if (retto.is_embed_build) {
  await retto.init();
}

Running OCR

1

Get image data

Load an image as bytes:
const file = document.querySelector('input[type="file"]').files[0];
const imageData = await file.arrayBuffer();
2

Process with streaming

OCR results stream in three stages:
for await (const stage of retto.recognize(imageData)) {
  if (stage.stage === 'det') {
    console.log('Text detection:', stage.result);
  } else if (stage.stage === 'cls') {
    console.log('Text classification:', stage.result);
  } else if (stage.stage === 'rec') {
    console.log('Text recognition:', stage.result);
  }
}
3

Use the results

Extract recognized text:
let recognizedText = [];

for await (const stage of retto.recognize(imageData)) {
  if (stage.stage === 'rec') {
    recognizedText = stage.result.map(r => r.text);
  }
}

console.log('Text:', recognizedText.join('\n'));

Complete Example

app.ts
import { Retto } from '@nekoimageland/retto-wasm';

class OCRApp {
  private retto: Retto | null = null;
  
  async initialize() {
    // Load WASM module
    console.log('Loading OCR engine...');
    this.retto = await Retto.load((progress) => {
      console.log(`Progress: ${(progress * 100).toFixed(0)}%`);
    });
    
    // Initialize with embedded models
    await this.retto.init();
    console.log('OCR engine ready!');
  }
  
  async processImage(file: File): Promise<string[]> {
    if (!this.retto) {
      throw new Error('OCR engine not initialized');
    }
    
    const imageData = await file.arrayBuffer();
    const results: string[] = [];
    
    for await (const stage of this.retto.recognize(imageData)) {
      if (stage.stage === 'rec') {
        results.push(...stage.result.map(r => r.text));
      }
    }
    
    return results;
  }
}

// Usage
const app = new OCRApp();
await app.initialize();

const fileInput = document.querySelector('input[type="file"]');
fileInput.addEventListener('change', async (e) => {
  const file = (e.target as HTMLInputElement).files?.[0];
  if (file) {
    const text = await app.processImage(file);
    console.log('Recognized text:', text);
  }
});

TypeScript Types

The package includes comprehensive TypeScript definitions:
export interface Point {
  x: number;
  y: number;
}

export interface PointBox {
  inner: [Point, Point, Point, Point];
}

export interface DetProcessorInnerResult {
  boxes: PointBox;
  score: number;
}

export type DetProcessorResult = DetProcessorInnerResult[];

export interface ClsPostProcessLabel {
  label: number;  // Rotation angle (0, 90, 180, 270)
  score: number;  // Confidence score
}

export interface RecProcessorSingleResult {
  text: string;   // Recognized text
  score: number;  // Confidence score
}

export type RecProcessorResult = RecProcessorSingleResult[];

React Example

OCRComponent.tsx
import React, { useState, useEffect } from 'react';
import { Retto, RecProcessorResult } from '@nekoimageland/retto-wasm';

export function OCRComponent() {
  const [retto, setRetto] = useState<Retto | null>(null);
  const [loading, setLoading] = useState(true);
  const [results, setResults] = useState<string[]>([]);
  
  useEffect(() => {
    async function init() {
      const instance = await Retto.load();
      await instance.init();
      setRetto(instance);
      setLoading(false);
    }
    init();
  }, []);
  
  const handleFileUpload = async (e: React.ChangeEvent<HTMLInputElement>) => {
    const file = e.target.files?.[0];
    if (!file || !retto) return;
    
    setLoading(true);
    const imageData = await file.arrayBuffer();
    const texts: string[] = [];
    
    for await (const stage of retto.recognize(imageData)) {
      if (stage.stage === 'rec') {
        texts.push(...stage.result.map(r => r.text));
      }
    }
    
    setResults(texts);
    setLoading(false);
  };
  
  return (
    <div>
      <input type="file" onChange={handleFileUpload} disabled={loading} />
      {loading && <p>Processing...</p>}
      <ul>
        {results.map((text, i) => <li key={i}>{text}</li>)}
      </ul>
    </div>
  );
}

Implementation Details

From index.ts:154, the load function downloads the WASM binary:
static async load(onProgress?: (ratio: number) => void): Promise<Retto> {
  const wasmUrl = new URL("public/retto_wasm.wasm", import.meta.url).href;
  const { data } = await axios.get<ArrayBuffer>(wasmUrl, {
    responseType: "arraybuffer",
    onDownloadProgress: ({ loaded, total }) => {
      if (total && onProgress) onProgress(loaded / total);
    },
  });
  const module = await initWASI({
    wasmBinary: data,
    locateFile: () => "",
  }) as typeof RettoInner;
  return new Retto(module);
}
The streaming API uses async generators (index.ts:237):
async *recognize(
  data: Uint8Array | ArrayBuffer,
): AsyncGenerator<RettoWorkerStage, void, unknown> {
  const sessionPtr = this.module._retto_rec(ptr, len);
  const sessionId = this.module.UTF8ToString(sessionPtr);
  
  const det = await once<DetProcessorResult>("det");
  yield { stage: "det", result: det };
  
  const cls = await once<ClsProcessorResult>("cls");
  yield { stage: "cls", result: cls };
  
  const rec = await once<RecProcessorResult>("rec");
  yield { stage: "rec", result: rec };
}
The WASM module runs in a web worker thread. Do not block the main thread during OCR processing.

Browser Compatibility

Retto WASM requires:
  • WebAssembly support
  • SharedArrayBuffer support (for threading)
  • Modern ES2020+ JavaScript features
For SharedArrayBuffer to work, your server must send these headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Performance Considerations

  1. Load once, reuse: Initialize the Retto instance once and reuse it for multiple images
  2. Model size: Embedded models increase bundle size; consider loading externally for production
  3. Image size: Larger images take longer to process; consider resizing before OCR
  4. Streaming: Use the streaming API to provide real-time feedback

Next Steps

Model Loading

Learn about embedding vs external models

Rust Usage

See the underlying Rust API

Build docs developers (and LLMs) love