Skip to main content

Overview

Meikipop includes several built-in OCR providers optimized for Japanese text recognition. Each provider offers different trade-offs between accuracy, speed, cost, and resource requirements.

Provider comparison

ProviderTypeSpeedAccuracyRequirementsBest for
DummyLocalInstantN/ANoneDevelopment and testing
meikiocrLocalFastHighGPU recommendedOffline gaming, privacy
Google Lens v2RemoteMediumVery HighInternetOnline use, best accuracy
owocrHybridMediumHighowocr daemonFlexible deployment
Chrome Screen AILocalFastMediumChrome componentsChrome browser integration

Built-in providers

Dummy OCR

The dummy provider is designed as a template for creating custom providers. It returns fixed mock data for testing.
src/ocr/providers/dummy/provider.py
class DummyProvider(OcrProvider):
    """
    A template for creating new OCR providers.
    
    When this provider is selected, it returns a fixed set of Japanese text
    to allow for testing of the popup window without a real OCR backend.
    """
    NAME = "Dummy OCR (Developer Template)"
Implementation highlights:
  • Returns hardcoded Japanese text with both horizontal and vertical examples
  • Demonstrates proper coordinate normalization
  • Shows character-level and word-level Word objects
  • Fully commented for educational purposes
Use cases:
  • Developing and testing UI without a real OCR backend
  • Template for creating custom providers
  • Understanding the data transformation process
Example output:
# Returns two paragraphs:
# 1. Horizontal: "これは横書きテキストです"
# 2. Vertical: "縦書き"

meikiocr (local)

The meikiocr provider uses a high-performance local model specifically optimized for Japanese video game text.
src/ocr/providers/meikiocr/provider.py
class MeikiOcrProvider(OcrProvider):
    """
    An OCR provider that uses the high-performance meikiocr library.
    This provider is specifically optimized for recognizing Japanese text from video games.
    """
    NAME = "meikiocr (local)"
    
    def __init__(self):
        self.ocr_client = MeikiOCR()
        logger.info(f"Running on: {self.ocr_client.active_provider}")
Implementation highlights:
  • Uses the meikiocr Python library
  • Converts PIL images to NumPy RGB arrays
  • Returns character-level boxes for precise lookups
  • Groups individual lines into paragraphs using postprocessing
  • Filters out non-Japanese text
Configuration:
DET_CONFIDENCE_THRESHOLD = 0.5  # Detection confidence
REC_CONFIDENCE_THRESHOLD = 0.1  # Recognition confidence
Processing pipeline:
1

Initialize

Creates a MeikiOCR client that handles model downloading and session management internally.
2

Convert image

Converts PIL Image to NumPy RGB array for library compatibility.
3

Run OCR

Calls run_ocr() with confidence thresholds to get character-level results.
4

Transform results

Converts [x1, y1, x2, y2] pixel coordinates to normalized BoundingBox objects.
5

Group paragraphs

Uses group_lines_into_paragraphs() to combine related lines.
Key methods:
def _to_normalized_bbox(self, bbox_pixels: list, img_width: int, img_height: int) -> BoundingBox:
    """Converts an [x1, y1, x2, y2] pixel bbox to a normalized meikipop BoundingBox."""
    x1, y1, x2, y2 = bbox_pixels
    box_w, box_h = x2 - x1, y2 - y1
    
    center_x = (x1 + box_w / 2) / img_width
    center_y = (y1 + box_h / 2) / img_height
    norm_w = box_w / img_width
    norm_h = box_h / img_height
    
    return BoundingBox(center_x, center_y, norm_w, norm_h)
Requirements:
  • Install: pip install meikiocr
  • GPU recommended for best performance
  • Models downloaded automatically on first run

Google Lens v2 (remote)

This provider sends screenshots to Google’s servers. Do not use with sensitive or private information.
src/ocr/providers/glensv2/provider.py
class GoogleLensOcrV2(OcrProvider):
    NAME = "Google Lens (remote)"
    
    def __init__(self):
        self._session = requests.Session()
        self._session.headers.update({
            'Content-Type': 'application/x-protobuf',
            'X-Goog-Api-Key': 'AIzaSyDr2UxVnv_U85AbhhY8XSHSIavUW0DC-sY',
            'User-Agent': 'Mozilla/5.0 ...'
        })
Implementation highlights:
  • Uses Google Lens API via protobuf protocol
  • Maintains persistent HTTP session for performance
  • Supports low-bandwidth mode (50% resolution, 16-color quantization)
  • Returns normalized coordinates directly (no conversion needed)
  • Filters for Japanese text using regex
Image processing:
if config.glens_low_bandwidth:
    # Reduce size by ~50%
    scale_factor = math.sqrt(0.5)
    new_width = int(image.width * scale_factor)
    new_height = int(image.height * scale_factor)
    processed_image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
    # Reduce to 16 colors
    processed_image = processed_image.convert('L').quantize(colors=16)
    processed_image.save(bio, format='PNG')
else:
    # Standard quality
    processed_image.save(bio, format='JPEG', quality=90)
Text direction detection:
for para in glens_response.objects_response.text.text_layout.paragraphs:
    is_vertical = para.writing_direction == WritingDirection.TOP_TO_BOTTOM
Requirements:
  • Active internet connection
  • Accepts Google’s data processing terms
Performance:
  • Network latency: ~200-500ms typical
  • Request timeout: 10 seconds
  • Logs detailed timing information

owocr (WebSocket)

The owocr provider connects to a running owocr daemon via WebSocket, allowing flexible deployment options.
src/ocr/providers/owocr/provider.py
class OwocrWebsocketProvider(OcrProvider):
    """
    An OCR provider that connects to a running owocr instance via websockets.
    This provider uses the synchronous websockets client to maintain a
    persistent connection.
    """
    NAME = "owocr (Websocket)"
Implementation highlights:
  • Maintains persistent WebSocket connection
  • Automatic reconnection on connection loss
  • Uses direct IP (127.0.0.1) to avoid localhost resolution delays
  • Two-part response protocol (acknowledgment + JSON results)
  • Returns normalized coordinates directly
Connection handling:
OWOCR_WEBSOCKET_URI = "ws://127.0.0.1:7331"

def _connect(self) -> bool:
    try:
        self.websocket = connect(
            OWOCR_WEBSOCKET_URI,
            open_timeout=3,
            ping_interval=20,
            ping_timeout=20
        )
        return True
    except Exception as e:
        logger.error(f"Could not connect to owocr: {e}")
        logger.info("Please ensure owocr is running with:")
        logger.info("owocr -r websocket -w websocket -of json -e glens")
        return False
Communication protocol:
1

Send image

Converts PIL Image to BMP format and sends as binary.
2

Receive acknowledgment

Waits for “True” confirmation (5 second timeout).
3

Receive results

Waits for JSON response with OCR results (30 second timeout).
4

Transform data

Converts owocr’s format to meikipop’s Paragraph objects.
Retry logic:
for attempt in range(2):
    try:
        if self.websocket is None:
            if not self._connect():
                return None
        # ... perform scan
    except ConnectionClosed:
        logger.warning("Websocket connection lost. Will attempt to reconnect...")
        self.websocket = None
        if attempt == 0:
            continue  # Retry once
Requirements:
  • Running owocr daemon
  • Command: owocr -r websocket -w websocket -of json -e glens
  • WebSocket connection to localhost:7331

Chrome Screen AI (local)

This provider uses Chrome’s Screen AI component for local, offline OCR processing.
src/ocr/providers/screenai/provider.py
class ScreenAiOcr(OcrProvider):
    NAME = "Chrome Screen AI (local)"
    
    # Class-level variables to ensure the native DLL is only initialized ONCE
    _is_initialized = False
    _lib = None
Implementation highlights:
  • Uses Chrome’s native Screen AI library via ctypes
  • Singleton pattern for library initialization (once per app lifetime)
  • Suppresses verbose native library output
  • Returns character-level (symbol) boxes
  • Automatically downsizes large images (>4MP)
Library initialization:
base_dir = Path.home() / ".config" / "screen_ai"
model_dir = base_dir / "resources"
dll_name = 'chrome_screen_ai.dll' if sys.platform == 'win32' else 'libchromescreenai.so'
Image preparation:
if image.width * image.height > 4000000:
    image.thumbnail((2000, 2000), Image.Resampling.LANCZOS)

img_rgba = image.convert('RGBA')
width, height = img_rgba.size
img_bytes = img_rgba.tobytes()

# Create Skia bitmap structure
bitmap.fPixmap.fPixels = ctypes.cast(ctypes.c_char_p(img_bytes), ctypes.c_void_p)
bitmap.fPixmap.fRowBytes = width * 4
bitmap.fPixmap.fInfo.fColorInfo.fColorType = 4  # kRGBA_8888
Output suppression:
@contextmanager
def suppress_output():
    """Redirects C/C++ level stdout and stderr to devnull."""
    devnull = os.open(os.devnull, os.O_WRONLY)
    original_stdout = os.dup(1)
    original_stderr = os.dup(2)
    os.dup2(devnull, 1)
    os.dup2(devnull, 2)
    try:
        yield
    finally:
        # Restore original streams
        os.dup2(original_stdout, 1)
        os.dup2(original_stderr, 2)
Text direction detection:
is_vertical = (line_box.direction == 3)  # DIRECTION_TOP_TO_BOTTOM
Requirements:
  • Download Screen AI components from: https://chrome-infra-packages.appspot.com/p/chromium/third_party/screen-ai
  • Extract to: ~/.config/screen_ai/resources/
  • Platform: Windows (DLL) or Linux (SO)

Common patterns

Postprocessing: Grouping lines into paragraphs

Most providers use the shared group_lines_into_paragraphs() utility:
from src.ocr.providers.postprocessing import group_lines_into_paragraphs

# After converting to line-level Paragraph objects
raw_lines: List[Paragraph] = [...]
final_paragraphs = group_lines_into_paragraphs(raw_lines)
This function:
  • Combines adjacent lines into logical paragraphs
  • Respects text direction (vertical vs. horizontal)
  • Improves text readability and context

Japanese text filtering

Several providers filter for Japanese text:
import re

JAPANESE_REGEX = re.compile(r'[\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FAF]')

line_has_japanese = any(JAPANESE_REGEX.search(w.plain_text) for w in line.words)
if not line_has_japanese:
    continue
Character ranges:
  • \u3040-\u309F: Hiragana
  • \u30A0-\u30FF: Katakana
  • \u4E00-\u9FAF: Kanji

Selecting a provider

Choose based on your requirements: For offline gaming:
Use meikiocr (local) - Best balance of speed and accuracy without internet
For maximum accuracy:
Use Google Lens v2 (remote) - Highest quality but requires internet
For development:
Use Dummy OCR - Test UI without actual OCR processing
For custom deployment:
Use owocr (Websocket) - Run OCR service on a different machine
For Chrome integration:
Use Chrome Screen AI (local) - Leverage existing Chrome components

Next steps

Create custom provider

Build your own OCR provider using these as examples

OCR provider interface

Understand the interface contract and data models

Build docs developers (and LLMs) love