Skip to main content

img2text()

Generate descriptive text from images using AI vision models. You can provide a prompt to guide the type of description or analysis you want from the image.

Method signature

await client.img2text(
    prompt: str,
    image_data: str
)

Parameters

prompt
str
required
The instruction or question about the image. This guides what kind of description or analysis you want. For example, “Describe this image”, “What objects are in this image?”, or “What is the main subject of this photo?”
image_data
str
required
The image data encoded as a base64 string. You need to convert your image to base64 format before passing it to this method.

Returns

content
str
The generated text description or analysis of the image based on your prompt.

Usage examples

Basic image description

import asyncio
import base64
from kellyapi import KellyAPI

client = KellyAPI(api_key="your-api-key")

async def main():
    # Read and encode the image
    with open("photo.jpg", "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")
    
    # Get image description
    description = await client.img2text(
        prompt="Describe this image in detail",
        image_data=image_data
    )
    print(description)

asyncio.run(main())

Identify objects in image

import asyncio
import base64
from kellyapi import KellyAPI

client = KellyAPI(api_key="your-api-key")

async def main():
    # Read and encode the image
    with open("scene.jpg", "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")
    
    # Identify objects
    objects = await client.img2text(
        prompt="List all the objects you can see in this image",
        image_data=image_data
    )
    print(objects)

asyncio.run(main())

Analyze image content

import asyncio
import base64
from kellyapi import KellyAPI

client = KellyAPI(api_key="your-api-key")

async def main():
    # Read and encode the image
    with open("artwork.jpg", "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")
    
    # Analyze the image
    analysis = await client.img2text(
        prompt="What is the mood and style of this artwork?",
        image_data=image_data
    )
    print(analysis)

asyncio.run(main())

Extract text from image

import asyncio
import base64
from kellyapi import KellyAPI

client = KellyAPI(api_key="your-api-key")

async def main():
    # Read and encode the image
    with open("document.jpg", "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")
    
    # Extract text
    text = await client.img2text(
        prompt="Extract and transcribe all text from this image",
        image_data=image_data
    )
    print(text)

asyncio.run(main())

Build docs developers (and LLMs) love