Tesseract OCR Setup

Overview

CoroNet uses Tesseract OCR as a fallback mechanism when the primary GPT-4o-mini vision model fails to detect a license plate. This ensures maximum reliability in license plate extraction.

Tesseract OCR is invoked automatically when the OpenAI model returns “NO_DETECTADA” (app.py:106-110). This dual-engine approach provides robust license plate detection.

How CoroNet Uses Tesseract

The application implements a two-tier OCR strategy:

Primary: GPT-4o-mini vision model (app.py:44-80)
Fallback: Tesseract OCR (app.py:106-110)

app.py

# Primary OCR with GPT-4o-mini
matricula = extract_plate_from_image(path)

# Fallback with pytesseract
if matricula == "NO_DETECTADA":
    image = Image.open(path)
    ocr_text = pytesseract.image_to_string(image, lang="eng")
    ocr_text = ocr_text.strip().replace(" ", "").replace("\n", "").upper()
    matricula = "".join([c for c in ocr_text if c.isalnum() or c == "-"])[:10]

Platform-Specific Installation

Windows
macOS
Linux

Windows Installation

Download Tesseract installer

Download the latest Tesseract OCR installer from the official repository:https://github.com/UB-Mannheim/tesseract/wikiChoose the appropriate version:

64-bit: tesseract-ocr-w64-setup-5.x.x.exe (recommended)
32-bit: tesseract-ocr-w32-setup-5.x.x.exe

Run the installer

Double-click the downloaded .exe file
Accept the license agreement
Choose installation directory (default: C:\Program Files\Tesseract-OCR)
Important: Select “Additional language data” and ensure English is selected
Complete the installation wizard

Add Tesseract to PATH

This step is critical for pytesseract to find the Tesseract executable.

Option 1: During Installation

Check the box “Add Tesseract to PATH” in the installer

Option 2: Manual Configuration

Open System Properties (Win + Pause/Break)
Click “Advanced system settings”
Click “Environment Variables”
Under “System variables”, find and select “Path”
Click “Edit” → “New”
Add: C:\Program Files\Tesseract-OCR
Click “OK” on all dialogs

Configure pytesseract path (if needed)

If Tesseract is not in your PATH, specify the path in your code:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Verify installation

Open PowerShell or Command Prompt and run:

tesseract --version

Expected output:

tesseract v5.x.x
 leptonica-1.x.x

Windows Troubleshooting

Error: “pytesseract.pytesseract.TesseractNotFoundError”Add this to the top of app.py:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Error: “Failed loading language ‘eng’”Re-run the installer and ensure English language data is selected.

macOS Installation

Install Homebrew (if not installed)

If you don’t have Homebrew, install it first:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install Tesseract via Homebrew

brew install tesseract

This will install:

Tesseract OCR engine
English language data (default)
All necessary dependencies

Install additional languages (optional)

For better accuracy with specific language license plates:

# Spanish language data
brew install tesseract-lang

Verify installation

tesseract --version
which tesseract

Expected output:

tesseract 5.x.x
 leptonica-1.x.x
/opt/homebrew/bin/tesseract  # or /usr/local/bin/tesseract

macOS Troubleshooting

Error: “brew command not found”Install Homebrew first (see Step 1) or use MacPorts:

sudo port install tesseract

Error: “Permission denied”Fix Homebrew permissions:

sudo chown -R $(whoami) /usr/local/bin /usr/local/lib

Linux Installation

Ubuntu/Debian
Fedora/RHEL/CentOS
Arch Linux

Update package list

sudo apt update

Install Tesseract

sudo apt install tesseract-ocr

Install language packs

# English (usually included by default)
sudo apt install tesseract-ocr-eng

# Spanish (optional)
sudo apt install tesseract-ocr-spa

# View all available languages
apt-cache search tesseract-ocr

Verify installation

tesseract --version
tesseract --list-langs

Expected output:

tesseract 4.x.x or 5.x.x
 leptonica-1.x.x

List of available languages (2):
eng
osd

Install Tesseract

sudo dnf install tesseract
# or for older versions:
sudo yum install tesseract

Install language data

sudo dnf install tesseract-langpack-eng
sudo dnf install tesseract-langpack-spa  # Spanish (optional)

Verify installation

tesseract --version

Install Tesseract

sudo pacman -S tesseract

Install language data

sudo pacman -S tesseract-data-eng
sudo pacman -S tesseract-data-spa  # Spanish (optional)

Verify installation

tesseract --version

Linux Troubleshooting

Error: “tesseract: command not found”Ensure the package is installed and in PATH:

which tesseract
echo $PATH

Error: “Failed loading language ‘eng’”Install the English language pack:

sudo apt install tesseract-ocr-eng  # Ubuntu/Debian
sudo dnf install tesseract-langpack-eng  # Fedora/RHEL

Python Integration

CoroNet uses the pytesseract Python wrapper to interface with Tesseract OCR.

Install pytesseract

The pytesseract package is included in requirements.txt:34:

pip install pytesseract
# or install all dependencies:
pip install -r requirements.txt

Basic Usage

import pytesseract
from PIL import Image

# Open an image
image = Image.open('license_plate.jpg')

# Extract text
text = pytesseract.image_to_string(image, lang='eng')
print(text)

CoroNet Implementation

The fallback OCR implementation in CoroNet (app.py:106-110):

if matricula == "NO_DETECTADA":
    image = Image.open(path)
    ocr_text = pytesseract.image_to_string(image, lang="eng")
    ocr_text = ocr_text.strip().replace(" ", "").replace("\n", "").upper()
    matricula = "".join([c for c in ocr_text if c.isalnum() or c == "-"])[:10]

This implementation:

Opens the image using PIL (Pillow)
Runs Tesseract OCR with English language data
Cleans the output (removes spaces, newlines, converts to uppercase)
Filters to alphanumeric characters and hyphens
Limits the result to 10 characters

Configuration Options

Custom Tesseract Path

If Tesseract is installed in a non-standard location, configure the path:

import pytesseract

# Windows
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# macOS (custom location)
pytesseract.pytesseract.tesseract_cmd = '/opt/homebrew/bin/tesseract'

# Linux (custom location)
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'

Language Configuration

Specify the language for OCR processing:

# Single language
text = pytesseract.image_to_string(image, lang='eng')

# Multiple languages (English + Spanish)
text = pytesseract.image_to_string(image, lang='eng+spa')

Advanced OCR Configuration

For better license plate recognition:

# Custom configuration for license plates
custom_config = r'--oem 3 --psm 7 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-'
text = pytesseract.image_to_string(image, lang='eng', config=custom_config)

Configuration options:

--oem 3: Use default OCR Engine Mode (LSTM)
--psm 7: Treat image as a single text line (good for license plates)
-c tessedit_char_whitelist: Restrict characters to alphanumeric + hyphen

Testing Tesseract

Test your Tesseract installation with a sample image:

import pytesseract
from PIL import Image
import os

def test_tesseract():
    # Check if Tesseract is accessible
    try:
        version = pytesseract.get_tesseract_version()
        print(f"✓ Tesseract version: {version}")
    except Exception as e:
        print(f"✗ Tesseract not found: {e}")
        return
    
    # List available languages
    try:
        langs = pytesseract.get_languages()
        print(f"✓ Available languages: {', '.join(langs)}")
    except Exception as e:
        print(f"✗ Error listing languages: {e}")
    
    # Test with a sample image
    if os.path.exists('uploads/sample.jpg'):
        image = Image.open('uploads/sample.jpg')
        text = pytesseract.image_to_string(image, lang='eng')
        print(f"✓ OCR Result: {text}")

if __name__ == "__main__":
    test_tesseract()

Performance Considerations

Tesseract OCR is significantly faster than the OpenAI vision model but may have lower accuracy for complex images. The dual-engine approach balances accuracy and speed.

Typical Processing Times

OpenAI GPT-4o-mini: 2-5 seconds (network latency + API processing)
Tesseract OCR: 0.1-0.5 seconds (local processing)

When Tesseract is Used

Tesseract fallback is triggered when:

OpenAI returns “NO_DETECTADA”
OpenAI API is unavailable or rate-limited
Image quality is too poor for vision model
Network connectivity issues

Next Steps

Environment Variables

Configure API keys and application settings

OpenAI Setup

Set up OpenAI API for primary OCR

Get Started

Core Features

Configuration

User Guide

Overview

How CoroNet Uses Tesseract

Platform-Specific Installation

Windows Installation

Windows Troubleshooting

macOS Installation

macOS Troubleshooting

Linux Installation

Linux Troubleshooting

Python Integration

Install pytesseract

Basic Usage

CoroNet Implementation

Configuration Options

Custom Tesseract Path

Language Configuration

Advanced OCR Configuration

Testing Tesseract

Performance Considerations

Typical Processing Times

When Tesseract is Used

Next Steps

Environment Variables

OpenAI Setup

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

User Guide

​Overview

​How CoroNet Uses Tesseract

​Platform-Specific Installation

​Windows Installation

​Windows Troubleshooting

​macOS Installation

​macOS Troubleshooting

​Linux Installation

​Linux Troubleshooting

​Python Integration

​Install pytesseract

​Basic Usage

​CoroNet Implementation

​Configuration Options

​Custom Tesseract Path

​Language Configuration

​Advanced OCR Configuration

​Testing Tesseract

​Performance Considerations

​Typical Processing Times

​When Tesseract is Used

​Next Steps

Environment Variables

OpenAI Setup

Build docs developers (and LLMs) love

Overview

How CoroNet Uses Tesseract

Platform-Specific Installation

Windows Installation

Windows Troubleshooting

macOS Installation

macOS Troubleshooting

Linux Installation

Linux Troubleshooting

Python Integration

Install pytesseract

Basic Usage

CoroNet Implementation

Configuration Options

Custom Tesseract Path

Language Configuration

Advanced OCR Configuration

Testing Tesseract

Performance Considerations

Typical Processing Times

When Tesseract is Used

Next Steps