The summarize command uses Google’s Gemini AI to extract and generate intelligent summaries from text documents in various formats.
Syntax
python solution.py summarize <input> [options]
Parameters
Path to the input document to summarize
Summary length option. Available choices:
short - Brief 1-paragraph summary (max ~1500 tokens)
medium - Concise 2-3 paragraph summary (max ~2500 tokens)
long - Detailed 3-4 paragraph summary (max ~4000 tokens)
Requirements
GOOGLE_API_KEY Required: This command requires a valid Google API key set as an environment variable.
Setup
- Sign up at aistudio.google.com
- Create a new project and obtain your API key
- Set the environment variable:
export GOOGLE_API_KEY="your_api_key_here"
On Windows:
set GOOGLE_API_KEY=your_api_key_here
Usage Examples
Default (Medium) Summary
python solution.py summarize document.pdf
Expected Output:
Summarizing: document.pdf (medium length)
Info: File validated (3.2MB) - uploading to AI service...
Info: Processing document and generating summary...
Success: Document processed successfully!
Generated Summary (487 characters):
============================================================
This document presents a comprehensive analysis of climate
change impacts on coastal ecosystems. The research examines
three key areas: rising sea levels, ocean acidification, and
temperature changes. Data collected over 15 years shows
significant correlation between warming trends and species
migration patterns.
The findings suggest immediate policy interventions are
required to mitigate long-term damage to marine biodiversity.
============================================================
Info: Summary saved to: /path/to/document_summary.txt
Short Summary
python solution.py summarize report.docx --length short
Generates a concise 1-paragraph summary focusing on the core essence of the document.
Expected Output:
Summarizing: report.docx (short length)
Info: File validated (1.8MB) - uploading to AI service...
Info: Processing document and generating summary...
Success: Document processed successfully!
Generated Summary (245 characters):
============================================================
This quarterly financial report shows 23% revenue growth
compared to Q1. Key drivers include expanded market share
in Asia-Pacific and successful product launches.
============================================================
Info: Summary saved to: /path/to/report_summary.txt
Long Summary
python solution.py summarize thesis.txt --length long
Generates a detailed 3-4 paragraph summary with comprehensive coverage of the document’s content.
Expected Output:
Summarizing: thesis.txt (long length)
Info: File validated (8.5MB) - uploading to AI service...
Info: Processing document and generating summary...
Success: Document processed successfully!
Generated Summary (1247 characters):
============================================================
This thesis investigates the applications of machine learning
in predictive healthcare analytics, focusing on early disease
detection systems. The research evaluates five different ML
algorithms across three major disease categories: cardiovascular,
diabetic, and oncological conditions.
The methodology employed a dataset of 50,000 patient records
spanning 10 years, with rigorous cross-validation and ethical
approval from institutional review boards. Random forest and
gradient boosting classifiers demonstrated superior performance
with accuracy rates exceeding 87% for early-stage detection.
Key findings indicate that integrating multi-modal data sources
(genetic, lifestyle, and clinical markers) significantly improves
predictive accuracy. The thesis also addresses critical concerns
around data privacy, algorithmic bias, and clinical deployment
challenges.
Recommendations include establishing standardized protocols for
ML model validation in healthcare settings and developing
transparent explainability frameworks for clinical decision
support systems.
============================================================
Info: Summary saved to: /path/to/thesis_summary.txt
Book Summary
python solution.py summarize book.pdf --length long
Ideal for generating comprehensive overviews of lengthy documents like books, research papers, or technical manuals.
AI Summarization Features
Intelligent Content Analysis
The AI model:
- Extracts key themes and main arguments
- Identifies important data points and statistics
- Recognizes document structure and hierarchy
- Maintains context across long documents
- Preserves technical accuracy for specialized content
Supported File Types
Works with various document formats:
- PDF documents
- Word documents (.docx, .doc)
- Text files (.txt)
- Markdown files (.md)
- And other text-based formats
Length Configurations
| Length | Description | Tokens | Temperature | Use Case |
|---|
| short | Brief essence | ~1500 | 0.5 | Quick overviews, abstracts |
| medium | Balanced detail | ~2500 | 0.7 | Standard summaries, reports |
| long | Comprehensive | ~4000 | 0.8 | Detailed analysis, academic work |
Output
The summary is:
- Displayed in the terminal with clear formatting
- Saved to a file named
{original_filename}_summary.txt in the same directory as the input file
Custom System Prompt
You can customize the AI behavior by editing the summarize_prompt.txt file in the project root. The prompt template uses these placeholders:
{{FILE_DETAILS}} - JSON representation of uploaded file metadata
{{SUMMARY_REQUIREMENTS}} - Description of desired summary length and format
Limitations
- File Size: Maximum 100MB per file
- Internet Required: Active internet connection for AI API access
- Processing Time: May take 30-120 seconds depending on file size
- API Quota: Subject to Google API usage limits
- Timeout: Processing times out after 5 minutes
Error Messages
Missing API Key
Error: GOOGLE_API_KEY environment variable is not set.
File Too Large
Error: File size (125.3MB) exceeds maximum limit of 100MB
Processing Failure
Error: RuntimeError: File processing failed
Notes
- The AI model used is Gemini 2.5 Flash (high-speed, efficient processing)
- Uploaded files are automatically deleted from Google’s servers after processing
- Summary quality improves with well-structured, clearly written source documents
- Processing time scales with document length and complexity