Installation
Install the npm package:
npm install @nekoimageland/retto-wasm
pnpm add @nekoimageland/retto-wasm
yarn add @nekoimageland/retto-wasm
The package is published as @nekoimageland/retto-wasm and includes TypeScript definitions.
Loading the Module
The WASM module must be loaded asynchronously before use:
import { Retto } from '@nekoimageland/retto-wasm';
// Load the WASM module with optional progress tracking
const retto = await Retto.load((progress) => {
console.log(`Loading: ${(progress * 100).toFixed(1)}%`);
});
Initialization
Embedded Models
External Models
If built with embedded models:Check if embedded models are available:if (retto.is_embed_build) {
await retto.init();
}
Load models from URLs:const models = {
det_model: await fetch('/models/det.onnx').then(r => r.arrayBuffer()),
cls_model: await fetch('/models/cls.onnx').then(r => r.arrayBuffer()),
rec_model: await fetch('/models/rec.onnx').then(r => r.arrayBuffer()),
rec_dict: await fetch('/models/dict.txt').then(r => r.arrayBuffer()),
};
await retto.init(models);
Running OCR
Get image data
Load an image as bytes:const file = document.querySelector('input[type="file"]').files[0];
const imageData = await file.arrayBuffer();
Process with streaming
OCR results stream in three stages:for await (const stage of retto.recognize(imageData)) {
if (stage.stage === 'det') {
console.log('Text detection:', stage.result);
} else if (stage.stage === 'cls') {
console.log('Text classification:', stage.result);
} else if (stage.stage === 'rec') {
console.log('Text recognition:', stage.result);
}
}
Use the results
Extract recognized text:let recognizedText = [];
for await (const stage of retto.recognize(imageData)) {
if (stage.stage === 'rec') {
recognizedText = stage.result.map(r => r.text);
}
}
console.log('Text:', recognizedText.join('\n'));
Complete Example
import { Retto } from '@nekoimageland/retto-wasm';
class OCRApp {
private retto: Retto | null = null;
async initialize() {
// Load WASM module
console.log('Loading OCR engine...');
this.retto = await Retto.load((progress) => {
console.log(`Progress: ${(progress * 100).toFixed(0)}%`);
});
// Initialize with embedded models
await this.retto.init();
console.log('OCR engine ready!');
}
async processImage(file: File): Promise<string[]> {
if (!this.retto) {
throw new Error('OCR engine not initialized');
}
const imageData = await file.arrayBuffer();
const results: string[] = [];
for await (const stage of this.retto.recognize(imageData)) {
if (stage.stage === 'rec') {
results.push(...stage.result.map(r => r.text));
}
}
return results;
}
}
// Usage
const app = new OCRApp();
await app.initialize();
const fileInput = document.querySelector('input[type="file"]');
fileInput.addEventListener('change', async (e) => {
const file = (e.target as HTMLInputElement).files?.[0];
if (file) {
const text = await app.processImage(file);
console.log('Recognized text:', text);
}
});
TypeScript Types
The package includes comprehensive TypeScript definitions:
export interface Point {
x: number;
y: number;
}
export interface PointBox {
inner: [Point, Point, Point, Point];
}
export interface DetProcessorInnerResult {
boxes: PointBox;
score: number;
}
export type DetProcessorResult = DetProcessorInnerResult[];
export interface ClsPostProcessLabel {
label: number; // Rotation angle (0, 90, 180, 270)
score: number; // Confidence score
}
export interface RecProcessorSingleResult {
text: string; // Recognized text
score: number; // Confidence score
}
export type RecProcessorResult = RecProcessorSingleResult[];
React Example
import React, { useState, useEffect } from 'react';
import { Retto, RecProcessorResult } from '@nekoimageland/retto-wasm';
export function OCRComponent() {
const [retto, setRetto] = useState<Retto | null>(null);
const [loading, setLoading] = useState(true);
const [results, setResults] = useState<string[]>([]);
useEffect(() => {
async function init() {
const instance = await Retto.load();
await instance.init();
setRetto(instance);
setLoading(false);
}
init();
}, []);
const handleFileUpload = async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file || !retto) return;
setLoading(true);
const imageData = await file.arrayBuffer();
const texts: string[] = [];
for await (const stage of retto.recognize(imageData)) {
if (stage.stage === 'rec') {
texts.push(...stage.result.map(r => r.text));
}
}
setResults(texts);
setLoading(false);
};
return (
<div>
<input type="file" onChange={handleFileUpload} disabled={loading} />
{loading && <p>Processing...</p>}
<ul>
{results.map((text, i) => <li key={i}>{text}</li>)}
</ul>
</div>
);
}
Implementation Details
From index.ts:154, the load function downloads the WASM binary:
static async load(onProgress?: (ratio: number) => void): Promise<Retto> {
const wasmUrl = new URL("public/retto_wasm.wasm", import.meta.url).href;
const { data } = await axios.get<ArrayBuffer>(wasmUrl, {
responseType: "arraybuffer",
onDownloadProgress: ({ loaded, total }) => {
if (total && onProgress) onProgress(loaded / total);
},
});
const module = await initWASI({
wasmBinary: data,
locateFile: () => "",
}) as typeof RettoInner;
return new Retto(module);
}
The streaming API uses async generators (index.ts:237):
async *recognize(
data: Uint8Array | ArrayBuffer,
): AsyncGenerator<RettoWorkerStage, void, unknown> {
const sessionPtr = this.module._retto_rec(ptr, len);
const sessionId = this.module.UTF8ToString(sessionPtr);
const det = await once<DetProcessorResult>("det");
yield { stage: "det", result: det };
const cls = await once<ClsProcessorResult>("cls");
yield { stage: "cls", result: cls };
const rec = await once<RecProcessorResult>("rec");
yield { stage: "rec", result: rec };
}
The WASM module runs in a web worker thread. Do not block the main thread during OCR processing.
Browser Compatibility
Retto WASM requires:
- WebAssembly support
SharedArrayBuffer support (for threading)
- Modern ES2020+ JavaScript features
For SharedArrayBuffer to work, your server must send these headers:Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
- Load once, reuse: Initialize the Retto instance once and reuse it for multiple images
- Model size: Embedded models increase bundle size; consider loading externally for production
- Image size: Larger images take longer to process; consider resizing before OCR
- Streaming: Use the streaming API to provide real-time feedback
Next Steps
Model Loading
Learn about embedding vs external models
Rust Usage
See the underlying Rust API