Overview
Sure includes an AI assistant that can help users understand their financial data by answering questions about accounts, transactions, income, expenses, net worth, and more. The assistant uses LLMs to process natural language queries and provide insights based on the user’s financial data.Quickstart: OpenAI Token
The easiest way to get started with AI features in Sure is to use OpenAI:Get an API key
Get an API key from OpenAI
gpt-4.1) for all AI operations.
Local vs. Cloud Inference
Cloud Inference (Recommended for Most Users)
What it means: The LLM runs on remote servers (like OpenAI’s infrastructure), and your app sends requests over the internet.| Pros | Cons |
|---|---|
| Zero setup - works immediately | Requires internet connection |
| Always uses the latest models | Data leaves your infrastructure (though transmitted securely) |
| No hardware requirements | Per-request costs |
| Scales automatically | Dependent on provider availability |
| Regular updates and improvements |
- You’re new to LLMs
- You want the best performance without setup
- You don’t have powerful hardware (GPU with large VRAM)
- You’re okay with cloud-based processing
- You’re running a managed instance
Local Inference (Self-Hosted)
What it means: The LLM runs on your own hardware using tools like Ollama, LM Studio, or LocalAI.| Pros | Cons |
|---|---|
| Complete data privacy - nothing leaves your network | Requires significant hardware (see below) |
| No per-request costs after initial setup | Setup and maintenance overhead |
| Works offline | Models may be less capable than latest cloud offerings |
| Full control over models and updates | You manage updates and improvements |
| Can be more cost-effective at scale | Performance depends on your hardware |
- Minimum (8GB VRAM): Can run 7B parameter models like
llama3.2:7borgemma2:7b- Works for basic chat functionality
- May struggle with complex financial analysis
- Recommended (16GB+ VRAM): Can run 13B-14B parameter models like
llama3.1:13borqwen2.5:14b- Good balance of performance and hardware requirements
- Handles most financial queries well
- Ideal (24GB+ VRAM): Can run 30B+ parameter models or run smaller models with higher precision
- Best quality responses
- Complex reasoning about financial data
- Privacy is critical (regulated industries, sensitive financial data)
- You have the required hardware
- You’re comfortable with technical setup
- You want to minimize ongoing costs
- You need offline functionality
Cloud Providers
Sure supports any OpenAI-compatible API endpoint. Here are tested providers:OpenAI (Primary Support)
gpt-4.1- Default, best balance of speed and qualitygpt-5- Latest model, highest quality (more expensive)gpt-4o-mini- Cheaper, good quality
Google Gemini (via OpenRouter)
OpenRouter provides access to many models including Gemini:- Single API for multiple providers
- Competitive pricing
- Automatic fallbacks
- Usage tracking
google/gemini-2.5-flash- Fast and capablegoogle/gemini-2.5-pro- High quality, good for complex queries
Anthropic Claude (via OpenRouter)
anthropic/claude-sonnet-4.5- Excellent reasoning, good with financial dataanthropic/claude-haiku-4.5- Fast and cost-effective
Other Providers
Any service offering an OpenAI-compatible API should work:- Groq - Fast inference, free tier available
- Together AI - Various open models
- Anyscale - Llama models
- Replicate - Various models
Local LLM Setup (Ollama)
Ollama is the recommended tool for running LLMs locally.Installation
- macOS
- Linux
- Windows
Start Ollama and pull a model
Configuration
Configure Sure to use Ollama:Docker Compose Example
Model Recommendations
For Chat Assistant
The AI assistant needs to understand financial context and perform function/tool calling:- Cloud
- Local
- Best:
gpt-4.1orgpt-5- Most reliable, best function calling - Good:
anthropic/claude-4.5-sonnet- Excellent reasoning - Budget:
google/gemini-2.5-flash- Fast and affordable
For Auto-Categorization
Transaction categorization doesn’t require function calling: Cloud:- Best: Same as chat -
gpt-4.1orgpt-5 - Budget:
gpt-4o-mini- Much cheaper, still very accurate
- Any model that works for chat will work for categorization
- This is less demanding than chat, so smaller models may suffice
- Some models don’t support structured outputs, please validate when using
For Merchant Detection
Similar requirements to categorization: Cloud:- Same recommendations as auto-categorization
- Same recommendations as auto-categorization
Configuration via Settings UI
For self-hosted deployments, you can configure AI settings through the web interface:Settings in the UI override environment variables. If you change settings in the UI, those values take precedence.
External AI Assistant
Instead of using the built-in LLM (which calls OpenAI or a local model directly), you can delegate chat to an external AI agent. The agent receives the conversation, can call back to Sure’s financial data via MCP, and streams a response. This is useful when:- You have a custom AI agent with domain knowledge, memory, or personality
- You want to use a non-OpenAI-compatible model (the agent translates)
- You want to keep LLM credentials and logic outside Sure entirely
How It Works
Agent calls Sure's MCP endpoint
Your agent can call Sure’s
/mcp endpoint for financial data (accounts, transactions, balance sheet)POST with messages array, return SSE with delta.content chunks.
Configuration
Configure via the UI or environment variables:- Settings UI
- Environment Variables
Security with Pipelock
When Pipelock is enabled (pipelock.enabled=true in Helm, or the pipelock service in Docker Compose), all traffic between Sure and the external agent is scanned:
- Outbound (Sure → agent): routed through Pipelock’s forward proxy via
HTTPS_PROXY - Inbound (agent → Sure /mcp): routed through Pipelock’s MCP reverse proxy (port 8889)
Access Control
UseEXTERNAL_ASSISTANT_ALLOWED_EMAILS to restrict which users can use the external assistant. When set, only users whose email matches the comma-separated list will see the AI chat. When blank, all users can access it.
Docker Compose Example
AI Cache Management
Sure caches AI-generated results (like auto-categorization and merchant detection) to avoid redundant API calls and costs. However, there are situations where you may want to clear this cache.What is the AI Cache?
When AI rules process transactions, Sure stores:- Enrichment records: Which attributes were set by AI (category, merchant, etc.)
- Attribute locks: Prevents rules from re-processing already-handled transactions
- Transactions won’t be sent to the LLM repeatedly
- Your API costs are minimized
- Processing is faster on subsequent rule runs
When to Reset the AI Cache
You might want to reset the cache when:- Switching LLM models: Different models may produce better categorizations
- Improving prompts: After system updates with better prompts
- Fixing miscategorizations: When AI made systematic errors
- Testing: During development or evaluation of AI features
How to Reset the AI Cache
- Via UI (Recommended)
- Automatic Reset
What Happens When Cache is Reset
- AI-locked attributes are unlocked: Transactions can be re-enriched
- AI enrichment records are deleted: The history of AI changes is cleared
- User edits are preserved: If you manually changed a category after AI set it, your change is kept
Cost Implications
Before resetting the cache, consider:| Scenario | Approximate Cost |
|---|---|
| 100 transactions | $0.05-0.20 |
| 1,000 transactions | $0.50-2.00 |
| 10,000 transactions | $5.00-20.00 |
gpt-4o-mini for lower costs.
Tips to minimize costs:
- Use narrow rule filters before running AI actions
- Reset cache only when necessary
- Consider using local LLMs for bulk re-processing
Observability with Langfuse
Sure includes built-in support for Langfuse, an open-source LLM observability platform.What is Langfuse?
Langfuse helps you:- Track all LLM requests and responses
- Monitor costs per request
- Measure response latency
- Debug failed requests
- Analyze usage patterns
- Optimize prompts based on real data
Setup
Create a Langfuse account
Create a free account at Langfuse Cloud or self-host Langfuse
- Chat messages and responses
- Auto-categorization requests
- Merchant detection
- Token usage and costs
- Response times
Langfuse Features in Sure
- Automatic tracing: Every LLM call is automatically traced
- Session tracking: Chat sessions are tracked with a unique session ID
- User anonymization: User IDs are hashed before sending to Langfuse
- Cost tracking: Token usage is logged for cost analysis
- Error tracking: Failed requests are logged with error details
Viewing Traces
Privacy Considerations
What’s sent to Langfuse:- Prompts and responses
- Model names
- Token counts
- Timestamps
- Session IDs
- Hashed user IDs (not actual user data)
- User email addresses
- User names
- Unhashed user IDs
- Account credentials
Vector Store (Document Search)
Sure’s AI assistant can search documents that have been uploaded to a family’s vault. Under the hood, documents are indexed in a vector store so the assistant can retrieve relevant passages when answering questions (Retrieval-Augmented Generation).How It Works
Document upload
When a user uploads a document to their family vault, it is automatically pushed to the configured vector store
Assistant searches
When the assistant needs financial context from uploaded files, it calls the
search_family_files functionSupported Backends
| Backend | Status | Best For | Requirements |
|---|---|---|---|
| OpenAI (default) | ready | Cloud deployments, zero setup | OPENAI_ACCESS_TOKEN |
| Pgvector | scaffolded | Self-hosted, full data privacy | PostgreSQL with pgvector extension |
| Qdrant | scaffolded | Self-hosted, dedicated vector DB | Running Qdrant instance |
Configuration
- OpenAI (Default)
- Pgvector (Self-Hosted)
- Qdrant (Self-Hosted)
No extra configuration is needed. If you already have
OPENAI_ACCESS_TOKEN set for the AI assistant, document search works automatically. OpenAI manages chunking, embedding, and retrieval.Verifying the Configuration
You can check whether a vector store is properly configured from the Rails console:Supported File Types
The following file extensions are supported for document upload and search:.pdf, .txt, .md, .csv, .json, .xml, .html, .css, .js, .rb, .py, .docx, .pptx, .xlsx, .yaml, .yml, .log, .sh
Privacy Notes
- OpenAI backend: Document content is sent to OpenAI’s API for indexing and search. The same privacy considerations as the AI chat apply.
- Pgvector / Qdrant backends: All data stays on your infrastructure. No external API calls are made for document search.
Cost Considerations
Cloud Costs
Typical costs for OpenAI (as of early 2025):- gpt-4.1: ~15-60 per 1M output tokens
- gpt-5: ~2-3x more expensive than gpt-4.1
- gpt-4o-mini: ~$0.15 per 1M input tokens (very cheap)
- Chat message: 500-2000 tokens (input) + 100-500 tokens (output)
- Auto-categorization: 1000-3000 tokens per 25 transactions
- Cost per chat message: $0.01-0.05 for gpt-4.1
- Use
gpt-4o-minifor categorization - Use Langfuse to identify expensive prompts
- Cache results when possible
- Consider local LLMs for high-volume operations
Local Costs
One-time costs:- GPU hardware: $500-2000+ depending on VRAM needs
- Setup time: 2-8 hours
- Electricity: ~$0.10-0.50 per hour of GPU usage
- Maintenance: Occasional updates and monitoring
- Cloud (gpt-4.1): ~$200-500/month
- Local (amortized): ~$50-100/month after hardware cost
- Break-even: 6-12 months depending on hardware cost
Hybrid Approach
You can mix providers (requires customization):Sure currently uses a single provider for all operations, but this could be customized.
Troubleshooting
"Messages is invalid" Error
"Messages is invalid" Error
Symptom: Cannot start a chat, see validation errorCause: Using a custom provider (like Ollama) without setting
OPENAI_MODELFix:Model Not Found
Model Not Found
Symptom: Error about model not being availableCloud: Check that you’re using a valid model name for your providerLocal: Make sure you’ve pulled the model:
Slow Responses
Slow Responses
Symptom: Long wait times for AI responsesCloud:
- Switch to a faster model (e.g.,
gpt-4o-miniorgemini-2.0-flash-exp) - Check your internet connection
- Verify provider status page
- Check GPU utilization (should be near 100% during inference)
- Try a smaller model
- Ensure you’re using GPU, not CPU
- Check for thermal throttling
No Provider Available
No Provider Available
Symptom: “Provider not found” or similar errorFix:
- Check
OPENAI_ACCESS_TOKENis set - For custom providers, verify
OPENAI_URI_BASEandOPENAI_MODEL - Restart Sure after changing environment variables
- Check logs for specific error messages
High Costs
High Costs
Symptom: Unexpected bills from cloud providerAnalysis:
- Check Langfuse for usage patterns
- Look for unusually long conversations
- Check if you’re using an expensive model
- Switch to cheaper model for categorization
- Consider local LLM for high-volume tasks
- Implement rate limiting if needed
- Review and optimize system prompts
Resources
- OpenAI Documentation
- Ollama Documentation
- OpenRouter Documentation
- Langfuse Documentation
- Sure GitHub Repository
Support
For issues with AI features:- Check this documentation first
- Search existing GitHub issues
- Open a new issue with:
- Your configuration (redact API keys!)
- Error messages
- Steps to reproduce
- Expected vs. actual behavior
Last Updated: March 2026