Skip to main content

Overview

The OcrProvider interface defines the contract that all OCR providers must implement to work with meikipop. This abstraction allows you to swap different OCR backends without modifying the core application logic. All interface definitions are located in src/ocr/interface.py.

OcrProvider abstract class

Your custom provider must inherit from OcrProvider and implement its abstract methods:
src/ocr/interface.py
class OcrProvider(abc.ABC):
    """
    Abstract base class for an OCR provider.

    Any class that implements this interface can be used by the application's
    OcrProcessor. This allows for easily swapping out different OCR backends.
    """

    @property
    @abc.abstractmethod
    def NAME(self) -> str:
        """A user-friendly name for this provider."""
        raise NotImplementedError

    @abc.abstractmethod
    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        """
        Performs OCR on the given image.

        Args:
            image: A PIL Image object to perform OCR on.

        Returns:
            A list of Paragraph objects found in the image, or None if an
            error occurred. Returns an empty list if no text is found.
        """
        raise NotImplementedError

Required properties

NAME
str
required
A unique, user-friendly string for your provider (e.g., "My Cool OCR"). This name appears in the settings and tray icon menus.

Required methods

scan
method
required
The core method where all OCR processing happens.Parameters:
  • image (PIL.Image.Image): The screen region to scan
Returns:
  • List[Paragraph]: If OCR succeeds (return empty list [] if no text found)
  • None: If a critical error occurred
The scan method receives a PIL Image object and must return data in meikipop’s standard format. Your main task is converting your OCR engine’s output into this format.

Data models

Your scan method must return data using these three immutable dataclasses:

BoundingBox

Represents the location and size of text with normalized coordinates.
src/ocr/interface.py
@dataclass(frozen=True)
class BoundingBox:
    """A normalized bounding box. All coordinates are floats between 0.0 and 1.0."""
    center_x: float
    center_y: float
    width: float
    height: float
center_x
float
required
Horizontal center position, normalized to 0.0-1.0 range (0.0 is left edge)
center_y
float
required
Vertical center position, normalized to 0.0-1.0 range (0.0 is top edge)
width
float
required
Width of the bounding box, normalized to 0.0-1.0 range
height
float
required
Height of the bounding box, normalized to 0.0-1.0 range
All coordinates and dimensions must be normalized to a 0.0-1.0 float range, relative to the input image’s dimensions. (0.0, 0.0) represents the top-left corner.

Converting pixel coordinates to normalized format

If your OCR engine returns absolute pixel coordinates, you need to convert them:
# Raw data from your OCR engine
raw_box = {'x': 50, 'y': 100, 'w': 200, 'h': 40}
img_width, img_height = 1000, 800

# Conversion to normalized center_x, center_y, width, height
center_x = (raw_box['x'] + raw_box['w'] / 2) / img_width  # 0.15
center_y = (raw_box['y'] + raw_box['h'] / 2) / img_height  # 0.15
width = raw_box['w'] / img_width  # 0.2
height = raw_box['h'] / img_height  # 0.05

# Create the meikipop object
meiki_box = BoundingBox(center_x, center_y, width, height)

Word

Represents a single recognized text element.
src/ocr/interface.py
@dataclass(frozen=True)
class Word:
    """Represents a single word recognized by the OCR."""
    text: str  # this can be either a word or a single character
    separator: str  # The separator that follows the word (e.g., a space) - optional
    box: BoundingBox
text
str
required
The recognized text. Can be a full word ("日本語") or a single character ("日"). Single-character boxes often lead to more precise lookups.
separator
str
required
The character that follows the word. Usually an empty string "" for Japanese text.
box
BoundingBox
required
The bounding box for this specific word or character.
Meikipop’s hit-scanning works well with both word-level and character-level boxes. Providing single-character boxes often leads to more precise dictionary lookups.

Paragraph

Represents a block of text composed of words.
src/ocr/interface.py
@dataclass(frozen=True)
class Paragraph:
    """Represents a block of text, composed of words."""
    full_text: str
    words: List[Word]
    box: BoundingBox
    is_vertical: bool  # True if text is top-to-bottom - optional
full_text
str
required
The complete, reconstructed text of the paragraph.
words
List[Word]
required
A list of Word objects that form this paragraph.
box
BoundingBox
required
The bounding box encompassing the entire paragraph.
is_vertical
bool
required
Must be True if the text is written top-to-bottom (vertical Japanese text). If your OCR engine doesn’t provide this information, you can infer it from the bounding box aspect ratio: height > width.
For Japanese text, correctly setting is_vertical is crucial for proper text rendering and lookups.

Example implementation

Here’s how a typical scan method transforms OCR data:
src/ocr/providers/dummy/provider.py
def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
    try:
        # 1. Get OCR data from your engine
        raw_ocr_results = your_ocr_engine.recognize(image)
        
        # 2. Transform to meikipop format
        paragraphs: List[Paragraph] = []
        img_width, img_height = image.size
        
        for ocr_line in raw_ocr_results:
            # Convert pixel bbox to normalized BoundingBox
            line_bbox_data = ocr_line.get("bbox")
            center_x = (line_bbox_data['x'] + line_bbox_data['w'] / 2) / img_width
            center_y = (line_bbox_data['y'] + line_bbox_data['h'] / 2) / img_height
            norm_w = line_bbox_data['w'] / img_width
            norm_h = line_bbox_data['h'] / img_height
            
            line_box = BoundingBox(center_x, center_y, norm_w, norm_h)
            
            # Determine text direction
            is_vertical = line_bbox_data['h'] > line_bbox_data['w']
            
            # Process words
            words_in_para: List[Word] = []
            for word_data in ocr_line.get("words", []):
                # Convert word coordinates
                word_box = convert_to_bounding_box(word_data['bbox'], img_width, img_height)
                words_in_para.append(Word(text=word_data['text'], separator="", box=word_box))
            
            # Create Paragraph object
            paragraph = Paragraph(
                full_text=ocr_line.get("text"),
                words=words_in_para,
                box=line_box,
                is_vertical=is_vertical
            )
            paragraphs.append(paragraph)
        
        # 3. Return the results
        return paragraphs
        
    except Exception as e:
        logger.error(f"Error in OCR: {e}", exc_info=True)
        return None  # Return None on critical errors

Next steps

Create a custom provider

Step-by-step guide to building your own OCR provider

Available providers

Explore the built-in OCR providers in meikipop

Build docs developers (and LLMs) love