Available OCR providers

Overview

Meikipop includes several built-in OCR providers optimized for Japanese text recognition. Each provider offers different trade-offs between accuracy, speed, cost, and resource requirements.

Provider comparison

Provider	Type	Speed	Accuracy	Requirements	Best for
Dummy	Local	Instant	N/A	None	Development and testing
meikiocr	Local	Fast	High	GPU recommended	Offline gaming, privacy
Google Lens v2	Remote	Medium	Very High	Internet	Online use, best accuracy
owocr	Hybrid	Medium	High	owocr daemon	Flexible deployment
Chrome Screen AI	Local	Fast	Medium	Chrome components	Chrome browser integration

Built-in providers

Dummy OCR

The dummy provider is designed as a template for creating custom providers. It returns fixed mock data for testing.

src/ocr/providers/dummy/provider.py

class DummyProvider(OcrProvider):
    """
    A template for creating new OCR providers.
    
    When this provider is selected, it returns a fixed set of Japanese text
    to allow for testing of the popup window without a real OCR backend.
    """
    NAME = "Dummy OCR (Developer Template)"

Implementation highlights:

Returns hardcoded Japanese text with both horizontal and vertical examples
Demonstrates proper coordinate normalization
Shows character-level and word-level Word objects
Fully commented for educational purposes

Use cases:

Developing and testing UI without a real OCR backend
Template for creating custom providers
Understanding the data transformation process

Example output:

# Returns two paragraphs:
# 1. Horizontal: "これは横書きテキストです"
# 2. Vertical: "縦書き"

meikiocr (local)

The meikiocr provider uses a high-performance local model specifically optimized for Japanese video game text.

src/ocr/providers/meikiocr/provider.py

class MeikiOcrProvider(OcrProvider):
    """
    An OCR provider that uses the high-performance meikiocr library.
    This provider is specifically optimized for recognizing Japanese text from video games.
    """
    NAME = "meikiocr (local)"
    
    def __init__(self):
        self.ocr_client = MeikiOCR()
        logger.info(f"Running on: {self.ocr_client.active_provider}")

Implementation highlights:

Uses the meikiocr Python library
Converts PIL images to NumPy RGB arrays
Returns character-level boxes for precise lookups
Groups individual lines into paragraphs using postprocessing
Filters out non-Japanese text

Configuration:

DET_CONFIDENCE_THRESHOLD = 0.5  # Detection confidence
REC_CONFIDENCE_THRESHOLD = 0.1  # Recognition confidence

Processing pipeline:

Initialize

Creates a MeikiOCR client that handles model downloading and session management internally.

Convert image

Converts PIL Image to NumPy RGB array for library compatibility.

Run OCR

Calls run_ocr() with confidence thresholds to get character-level results.

Transform results

Converts [x1, y1, x2, y2] pixel coordinates to normalized BoundingBox objects.

Group paragraphs

Uses group_lines_into_paragraphs() to combine related lines.

Key methods:

def _to_normalized_bbox(self, bbox_pixels: list, img_width: int, img_height: int) -> BoundingBox:
    """Converts an [x1, y1, x2, y2] pixel bbox to a normalized meikipop BoundingBox."""
    x1, y1, x2, y2 = bbox_pixels
    box_w, box_h = x2 - x1, y2 - y1
    
    center_x = (x1 + box_w / 2) / img_width
    center_y = (y1 + box_h / 2) / img_height
    norm_w = box_w / img_width
    norm_h = box_h / img_height
    
    return BoundingBox(center_x, center_y, norm_w, norm_h)

Requirements:

Install: pip install meikiocr
GPU recommended for best performance
Models downloaded automatically on first run

Google Lens v2 (remote)

This provider sends screenshots to Google’s servers. Do not use with sensitive or private information.

src/ocr/providers/glensv2/provider.py

class GoogleLensOcrV2(OcrProvider):
    NAME = "Google Lens (remote)"
    
    def __init__(self):
        self._session = requests.Session()
        self._session.headers.update({
            'Content-Type': 'application/x-protobuf',
            'X-Goog-Api-Key': 'AIzaSyDr2UxVnv_U85AbhhY8XSHSIavUW0DC-sY',
            'User-Agent': 'Mozilla/5.0 ...'
        })

Implementation highlights:

Uses Google Lens API via protobuf protocol
Maintains persistent HTTP session for performance
Supports low-bandwidth mode (50% resolution, 16-color quantization)
Returns normalized coordinates directly (no conversion needed)
Filters for Japanese text using regex

Image processing:

if config.glens_low_bandwidth:
    # Reduce size by ~50%
    scale_factor = math.sqrt(0.5)
    new_width = int(image.width * scale_factor)
    new_height = int(image.height * scale_factor)
    processed_image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
    # Reduce to 16 colors
    processed_image = processed_image.convert('L').quantize(colors=16)
    processed_image.save(bio, format='PNG')
else:
    # Standard quality
    processed_image.save(bio, format='JPEG', quality=90)

Text direction detection:

for para in glens_response.objects_response.text.text_layout.paragraphs:
    is_vertical = para.writing_direction == WritingDirection.TOP_TO_BOTTOM

Requirements:

Active internet connection
Accepts Google’s data processing terms

Performance:

Network latency: ~200-500ms typical
Request timeout: 10 seconds
Logs detailed timing information

owocr (WebSocket)

The owocr provider connects to a running owocr daemon via WebSocket, allowing flexible deployment options.

src/ocr/providers/owocr/provider.py

class OwocrWebsocketProvider(OcrProvider):
    """
    An OCR provider that connects to a running owocr instance via websockets.
    This provider uses the synchronous websockets client to maintain a
    persistent connection.
    """
    NAME = "owocr (Websocket)"

Implementation highlights:

Maintains persistent WebSocket connection
Automatic reconnection on connection loss
Uses direct IP (127.0.0.1) to avoid localhost resolution delays
Two-part response protocol (acknowledgment + JSON results)
Returns normalized coordinates directly

Connection handling:

OWOCR_WEBSOCKET_URI = "ws://127.0.0.1:7331"

def _connect(self) -> bool:
    try:
        self.websocket = connect(
            OWOCR_WEBSOCKET_URI,
            open_timeout=3,
            ping_interval=20,
            ping_timeout=20
        )
        return True
    except Exception as e:
        logger.error(f"Could not connect to owocr: {e}")
        logger.info("Please ensure owocr is running with:")
        logger.info("owocr -r websocket -w websocket -of json -e glens")
        return False

Communication protocol:

Send image

Converts PIL Image to BMP format and sends as binary.

Receive acknowledgment

Waits for “True” confirmation (5 second timeout).

Receive results

Waits for JSON response with OCR results (30 second timeout).

Transform data

Converts owocr’s format to meikipop’s Paragraph objects.

Retry logic:

for attempt in range(2):
    try:
        if self.websocket is None:
            if not self._connect():
                return None
        # ... perform scan
    except ConnectionClosed:
        logger.warning("Websocket connection lost. Will attempt to reconnect...")
        self.websocket = None
        if attempt == 0:
            continue  # Retry once

Requirements:

Running owocr daemon
Command: owocr -r websocket -w websocket -of json -e glens
WebSocket connection to localhost:7331

Chrome Screen AI (local)

This provider uses Chrome’s Screen AI component for local, offline OCR processing.

src/ocr/providers/screenai/provider.py

class ScreenAiOcr(OcrProvider):
    NAME = "Chrome Screen AI (local)"
    
    # Class-level variables to ensure the native DLL is only initialized ONCE
    _is_initialized = False
    _lib = None

Implementation highlights:

Uses Chrome’s native Screen AI library via ctypes
Singleton pattern for library initialization (once per app lifetime)
Suppresses verbose native library output
Returns character-level (symbol) boxes
Automatically downsizes large images (>4MP)

Library initialization:

base_dir = Path.home() / ".config" / "screen_ai"
model_dir = base_dir / "resources"
dll_name = 'chrome_screen_ai.dll' if sys.platform == 'win32' else 'libchromescreenai.so'

Image preparation:

if image.width * image.height > 4000000:
    image.thumbnail((2000, 2000), Image.Resampling.LANCZOS)

img_rgba = image.convert('RGBA')
width, height = img_rgba.size
img_bytes = img_rgba.tobytes()

# Create Skia bitmap structure
bitmap.fPixmap.fPixels = ctypes.cast(ctypes.c_char_p(img_bytes), ctypes.c_void_p)
bitmap.fPixmap.fRowBytes = width * 4
bitmap.fPixmap.fInfo.fColorInfo.fColorType = 4  # kRGBA_8888

Output suppression:

@contextmanager
def suppress_output():
    """Redirects C/C++ level stdout and stderr to devnull."""
    devnull = os.open(os.devnull, os.O_WRONLY)
    original_stdout = os.dup(1)
    original_stderr = os.dup(2)
    os.dup2(devnull, 1)
    os.dup2(devnull, 2)
    try:
        yield
    finally:
        # Restore original streams
        os.dup2(original_stdout, 1)
        os.dup2(original_stderr, 2)

Text direction detection:

is_vertical = (line_box.direction == 3)  # DIRECTION_TOP_TO_BOTTOM

Requirements:

Download Screen AI components from: https://chrome-infra-packages.appspot.com/p/chromium/third_party/screen-ai
Extract to: ~/.config/screen_ai/resources/
Platform: Windows (DLL) or Linux (SO)

Common patterns

Postprocessing: Grouping lines into paragraphs

Most providers use the shared group_lines_into_paragraphs() utility:

from src.ocr.providers.postprocessing import group_lines_into_paragraphs

# After converting to line-level Paragraph objects
raw_lines: List[Paragraph] = [...]
final_paragraphs = group_lines_into_paragraphs(raw_lines)

This function:

Combines adjacent lines into logical paragraphs
Respects text direction (vertical vs. horizontal)
Improves text readability and context

Japanese text filtering

Several providers filter for Japanese text:

import re

JAPANESE_REGEX = re.compile(r'[\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FAF]')

line_has_japanese = any(JAPANESE_REGEX.search(w.plain_text) for w in line.words)
if not line_has_japanese:
    continue

Character ranges:

\u3040-\u309F: Hiragana
\u30A0-\u30FF: Katakana
\u4E00-\u9FAF: Kanji

Selecting a provider

Choose based on your requirements: For offline gaming:

Use meikiocr (local) - Best balance of speed and accuracy without internet

For maximum accuracy:

Use Google Lens v2 (remote) - Highest quality but requires internet

For development:

Use Dummy OCR - Test UI without actual OCR processing

For custom deployment:

Use owocr (Websocket) - Run OCR service on a different machine

For Chrome integration:

Use Chrome Screen AI (local) - Leverage existing Chrome components

Custom OCR Providers

Contributing

Available OCR providers

Overview

Provider comparison

Built-in providers

Dummy OCR

meikiocr (local)

Google Lens v2 (remote)

owocr (WebSocket)

Chrome Screen AI (local)

Common patterns

Postprocessing: Grouping lines into paragraphs

Japanese text filtering

Selecting a provider

Next steps

Create custom provider

OCR provider interface

Build docs developers (and LLMs) love

Custom OCR Providers

Contributing

​Overview

​Provider comparison

​Built-in providers

​Dummy OCR

​meikiocr (local)

​Google Lens v2 (remote)

​owocr (WebSocket)

​Chrome Screen AI (local)

​Common patterns

​Postprocessing: Grouping lines into paragraphs

​Japanese text filtering

​Selecting a provider

​Next steps

Create custom provider

OCR provider interface

Build docs developers (and LLMs) love

Overview

Provider comparison

Built-in providers

Dummy OCR

meikiocr (local)

Google Lens v2 (remote)

owocr (WebSocket)

Chrome Screen AI (local)

Common patterns

Postprocessing: Grouping lines into paragraphs

Japanese text filtering

Selecting a provider

Next steps