API Integration

Overview

AI Math Notes uses OpenAI’s GPT-4o vision model to analyze hand-drawn mathematical equations and return calculated results. The integration involves base64 image encoding, structured prompts, and response parsing.

OpenAI Client Setup

The OpenAI client is initialized in the DrawingApp constructor (main.py:43):

self.client = OpenAI()

This assumes the OPENAI_API_KEY environment variable is set (as per README setup instructions).

Image Encoding

Before sending the drawn equation to the API, the PIL Image must be converted to base64-encoded PNG.

encode_image_to_base64()

This helper function (main.py:88-91) handles the conversion:

image

PIL.Image

The PIL Image object containing the drawn equation

def encode_image_to_base64(image):
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    return base64.b64encode(buffered.getvalue()).decode('utf-8')

Process:

Create an in-memory BytesIO buffer
Save the PIL Image to the buffer in PNG format
Encode the buffer contents to base64
Decode bytes to UTF-8 string for API transmission

API Call Structure

The calculate() method (main.py:87-113) orchestrates the API request:

def calculate(self):
    base64_image = encode_image_to_base64(self.image)

    response = self.client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Give the answer to this math equation. Only respond with the answer. Only respond with numbers. NEVER Words. Only answer unanswered expressions. Look for equal sign with nothing on the right of it. If it has an answer already. DO NOT ANSWER it."},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ],
            }
        ],
        max_tokens=300,
    )

    answer = response.choices[0].message.content
    self.draw_answer(answer)

API Parameters

model

string

default:"gpt-4o"

The GPT-4o vision model capable of analyzing images

messages

array

Array containing a single user message with multimodal content (text + image)

max_tokens

integer

default:"300"

Maximum tokens for the response (answers are typically short numbers)

Prompt Engineering

The prompt (main.py:101) is carefully designed to constrain the model’s output:

Give the answer to this math equation. 
Only respond with the answer. 
Only respond with numbers. 
NEVER Words. 
Only answer unanswered expressions. 
Look for equal sign with nothing on the right of it. 
If it has an answer already. DO NOT ANSWER it.

Prompt Strategy

Output Format: “Only respond with numbers. NEVER Words” ensures numeric-only responses
Task Clarity: “Give the answer to this math equation” defines the objective
Conditional Logic: Only solve equations with incomplete equals signs (e.g., 5 + 3 = not 5 + 3 = 8)
Brevity: “Only respond with the answer” prevents explanations

This design allows the model to distinguish between:

5 + 3 = (needs solving) → Returns 8
5 + 3 = 8 (already solved) → Returns nothing

Multimodal Content Structure

The API accepts multimodal input via the content array:

"content": [
    {"type": "text", "text": "<prompt>"},
    {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
    },
]

Image URL Format

The base64 image is embedded using a data URI:

data:image/png;base64,<base64_encoded_image>

This format allows the image to be sent inline without external hosting.

Response Parsing

The API response is parsed to extract the answer (main.py:112):

answer = response.choices[0].message.content

The response object structure:

{
    "choices": [
        {
            "message": {
                "content": "8"  # The calculated answer
            }
        }
    ]
}

Since the prompt constrains output to numbers only, content contains the numeric result as a string.

Integration Flow

User Action: User draws equation and presses Enter/Return (main.py:26)
Event Trigger: command_calculate() calls calculate() (main.py:116-117)
Image Encoding: PIL Image converted to base64 PNG
API Request: Multimodal request sent to GPT-4o with prompt + image
Response: Model returns numeric answer
Display: Answer rendered on canvas via draw_answer() (main.py:113)

Error Handling

The current implementation (main.py:87-113) does not include explicit error handling. Potential failure points:

Network connectivity issues
API authentication errors
Rate limiting
Invalid responses from the model

Future improvements could add try-except blocks around the API call.

API Requirements

Environment Setup

From the README:

# Setup OpenAI API as environment variable
export OPENAI_API_KEY="your-api-key-here"

Dependencies

From requirements.txt:

openai==1.14.2

The OpenAI Python client handles authentication, retries, and request formatting.

Performance Considerations

Image Size: 1200x800 canvas results in ~50-100KB base64 strings
Latency: API calls typically complete in 1-3 seconds
Token Usage: Responses use less than 10 tokens (just the numeric answer)
Cost: GPT-4o vision pricing applies per API call

Example Request/Response

Request

{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Give the answer to this math equation..."},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KG..."}}
            ]
        }
    ],
    "max_tokens": 300
}

Response

{
    "choices": [
        {"message": {"content": "8"}}
    ]
}

For a drawn equation 5 + 3 =, the model returns "8".

Get Started

User Guide

Development

Resources

Overview

OpenAI Client Setup

Image Encoding

encode_image_to_base64()

API Call Structure

API Parameters

Prompt Engineering

Prompt Strategy

Multimodal Content Structure

Image URL Format

Response Parsing

Integration Flow

Error Handling

API Requirements

Environment Setup

Dependencies

Performance Considerations

Example Request/Response

Request

Response

Build docs developers (and LLMs) love

Get Started

User Guide

Development

Resources

​Overview

​OpenAI Client Setup

​Image Encoding

​encode_image_to_base64()

​API Call Structure

​API Parameters

​Prompt Engineering

​Prompt Strategy

​Multimodal Content Structure

​Image URL Format

​Response Parsing

​Integration Flow

​Error Handling

​API Requirements

​Environment Setup

​Dependencies

​Performance Considerations

​Example Request/Response

​Request

​Response

Build docs developers (and LLMs) love

Overview

OpenAI Client Setup

Image Encoding

encode_image_to_base64()

API Call Structure

API Parameters

Prompt Engineering

Prompt Strategy

Multimodal Content Structure

Image URL Format

Response Parsing

Integration Flow

Error Handling

API Requirements

Environment Setup

Dependencies

Performance Considerations

Example Request/Response

Request

Response