Document processing with Gemini enables you to extract structured information, classify document types, answer questions, summarize content, and translate documents using natural language prompts. Gemini’s native PDF processing means you can work directly with documents without complex preprocessing.
# System instruction for extractionextraction_instruction = """You are a document entity extraction specialist.Extract text values exactly as they appear in the document.Do not normalize entity values."""# Load PDF filewith open("invoice.pdf", "rb") as f: file_bytes = f.read()# Extract structured dataresponse = client.models.generate_content( model=MODEL_ID, contents=[ "The following document is an invoice.", Part.from_bytes(data=file_bytes, mime_type="application/pdf"), ], config=GenerateContentConfig( system_instruction=extraction_instruction, response_schema=Invoice, response_mime_type="application/json", ),)invoice_data = response.parsedprint(invoice_data)
You can also process documents from Cloud Storage using Part.from_uri() instead of loading files locally.
qa_instruction = """You are a question answering specialist.Provide answers based only on the context provided.Give the answer first, followed by an explanation."""response = client.models.generate_content( model=MODEL_ID, contents=[ "What is the attention mechanism?", Part.from_uri( file_uri="gs://cloud-samples-data/generative-ai/pdf/1706.03762v7.pdf", mime_type="application/pdf", ), ], config=GenerateContentConfig( system_instruction=qa_instruction, ),)print(response.text)
summarization_instruction = """You are a document summarization specialist.Provide a detailed summary of the content.If images are present, describe them.If tables exist, extract key data.Do not include numbers not mentioned in the document."""response = client.models.generate_content( model=MODEL_ID, contents=[ "Summarize the following document.", Part.from_uri( file_uri="gs://cloud-samples-data/generative-ai/pdf/report.pdf", mime_type="application/pdf", ), ], config=GenerateContentConfig( system_instruction=summarization_instruction, ),)print(response.text)
table_prompt = "What is the HTML code of the table in this document?"response = client.models.generate_content( model=MODEL_ID, contents=[ table_prompt, Part.from_uri( file_uri="gs://cloud-samples-data/generative-ai/pdf/salary_table.pdf", mime_type="application/pdf", ), ],)html_table = response.text.removeprefix("```html").removesuffix("```")print(html_table)
translation_prompt = """Translate the first paragraph into French and Spanish.Label each paragraph with the target language."""response = client.models.generate_content( model=MODEL_ID, contents=[ translation_prompt, Part.from_uri( file_uri="gs://cloud-samples-data/generative-ai/pdf/document.pdf", mime_type="application/pdf", ), ],)print(response.text)