Overview
The AI Data Analysis Agent is a powerful tool that makes data analysis accessible to everyone, regardless of SQL expertise. Built with the Agno Agent framework and OpenAI’s GPT-4o, it converts natural language questions into SQL queries, executes them against your data using DuckDB, and returns clear, actionable insights.Features
File Upload Support
- Upload CSV and Excel files
- Automatic data type detection
- Schema inference
- Multiple format support (.csv, .xlsx)
Natural Language Queries
- Convert plain English to SQL
- No SQL knowledge required
- Complex aggregations and filters
- Instant answers about your data
Advanced Analysis
- Complex data aggregations
- Filtering and sorting
- Statistical summaries
- Data visualizations
Interactive UI
- User-friendly Streamlit interface
- Real-time query processing
- Clear result presentation
- Data preview tables
How It Works
Preprocessing
The agent automatically:
- Detects data types (dates, numbers, strings)
- Handles missing values
- Infers schema
- Loads data into DuckDB
Natural Language Query
Users ask questions in plain English:
- “What are the top 5 products by revenue?”
- “Show me average sales by region”
- “Which customers made purchases over $1000?”
Setup
Install Dependencies
streamlit==1.41.1- Web interfaceopenai==1.58.1- OpenAI API clientduckdb>=1.4.1- In-memory SQL databasepandas- Data manipulationnumpy==1.26.4- Numerical computingagno>=2.2.10- Agent framework
Configure API Keys
Get your OpenAI API key from OpenAI PlatformYou’ll enter it in the Streamlit sidebar when running the app (no environment variables needed)
Usage
Quick Start
Example Queries
Sales Analysis
Sales Analysis
Customer Insights
Customer Insights
Product Analytics
Product Analytics
Time-based Analysis
Time-based Analysis
Code Example
Data Preprocessing
Agent Configuration
Complete Streamlit App Structure
Advanced Features
DuckDB Integration
DuckDB provides fast, in-memory SQL analysis:- Benefits
- Supported Operations
- Fast: OLAP-optimized for analytical queries
- In-Memory: No external database needed
- SQL-Compatible: Standard SQL syntax
- Pandas Integration: Seamless data exchange
Automatic Type Detection
The agent intelligently detects and converts:- Date formats (YYYY-MM-DD, MM/DD/YYYY, etc.)
- Numeric types (integers, floats)
- Currency values ($1,234.56)
- Missing values (NA, N/A, null)
- String data with proper quoting
Use Cases
Sales Analytics
Analyze sales trends, top products, regional performance, and revenue metrics
Customer Analysis
Segment customers, analyze behavior, calculate lifetime value, identify churn
Financial Reporting
Generate P&L summaries, expense analysis, budget tracking, and forecasts
Operations Data
Inventory analysis, supply chain metrics, production efficiency, logistics
Marketing Analytics
Campaign performance, conversion rates, ROI analysis, channel attribution
HR Analytics
Headcount analysis, turnover rates, compensation benchmarks, performance metrics
Best Practices
Terminal Output: The app displays cleaner output in the terminal. Check your terminal for detailed agent reasoning and SQL queries.
Optimizing Queries
Reference Column Names
If you know column names, include them: “sum the revenue_usd column by region”
Troubleshooting
File Upload Errors
File Upload Errors
Issue: File won’t upload or shows errorsSolutions:
- Ensure file is .csv or .xlsx format
- Check for special characters in column names
- Verify file isn’t corrupted
- Try saving as CSV if Excel file fails
Query Returns Empty Results
Query Returns Empty Results
Issue: Agent returns no data or “0 rows”Solutions:
- Verify data is loaded (check data preview)
- Check column names match your query
- Try simpler query first: “show me first 5 rows”
- Ensure filters aren’t too restrictive
Incorrect SQL Generated
Incorrect SQL Generated
Issue: Agent misunderstands querySolutions:
- Rephrase query more explicitly
- Reference specific column names
- Break complex queries into simpler parts
- Check terminal for generated SQL
API Rate Limits
API Rate Limits
Issue: OpenAI API errorsSolutions:
- Wait a moment between queries
- Check API key has credits
- Verify API key is valid
- Monitor OpenAI dashboard for usage
Performance Tips
File Size
Works best with files under 100MB. For larger datasets, consider sampling or filtering data first.
Query Complexity
Simple aggregations are fast. Complex multi-step queries may take 5-10 seconds.
Result Size
Limit large result sets: “show me top 100” instead of returning millions of rows.
API Costs
Each query uses OpenAI API tokens. Simple queries use ~500 tokens, complex ones may use 2000+.
Next Steps
Tutorial
Follow the complete step-by-step tutorial
More Examples
Explore other AI agent examples
Agno Framework
Learn more about the Agno agent framework
GitHub
View source code and contribute
