Overview
TheOcrProvider interface defines the contract that all OCR providers must implement to work with meikipop. This abstraction allows you to swap different OCR backends without modifying the core application logic.
All interface definitions are located in src/ocr/interface.py.
OcrProvider abstract class
Your custom provider must inherit fromOcrProvider and implement its abstract methods:
src/ocr/interface.py
Required properties
A unique, user-friendly string for your provider (e.g.,
"My Cool OCR"). This name appears in the settings and tray icon menus.Required methods
The core method where all OCR processing happens.Parameters:
image(PIL.Image.Image): The screen region to scan
List[Paragraph]: If OCR succeeds (return empty list[]if no text found)None: If a critical error occurred
The
scan method receives a PIL Image object and must return data in meikipop’s standard format. Your main task is converting your OCR engine’s output into this format.Data models
Yourscan method must return data using these three immutable dataclasses:
BoundingBox
Represents the location and size of text with normalized coordinates.src/ocr/interface.py
Horizontal center position, normalized to 0.0-1.0 range (0.0 is left edge)
Vertical center position, normalized to 0.0-1.0 range (0.0 is top edge)
Width of the bounding box, normalized to 0.0-1.0 range
Height of the bounding box, normalized to 0.0-1.0 range
Converting pixel coordinates to normalized format
If your OCR engine returns absolute pixel coordinates, you need to convert them:Word
Represents a single recognized text element.src/ocr/interface.py
The recognized text. Can be a full word (
"日本語") or a single character ("日"). Single-character boxes often lead to more precise lookups.The character that follows the word. Usually an empty string
"" for Japanese text.The bounding box for this specific word or character.
Paragraph
Represents a block of text composed of words.src/ocr/interface.py
The complete, reconstructed text of the paragraph.
A list of
Word objects that form this paragraph.The bounding box encompassing the entire paragraph.
Must be
True if the text is written top-to-bottom (vertical Japanese text). If your OCR engine doesn’t provide this information, you can infer it from the bounding box aspect ratio: height > width.For Japanese text, correctly setting
is_vertical is crucial for proper text rendering and lookups.Example implementation
Here’s how a typicalscan method transforms OCR data:
src/ocr/providers/dummy/provider.py
Next steps
Create a custom provider
Step-by-step guide to building your own OCR provider
Available providers
Explore the built-in OCR providers in meikipop