Architecture Overview
Macro is built as a Rust-based cloud storage microservices architecture using a Cargo workspace with 80+ crates. The system handles document storage, processing, search, communication, and email functionality with a SolidJS frontend and Pulumi-managed AWS infrastructure.Core Services
The backend is organized into several categories of microservices:Document Storage
document-storage-service: Main document storage APIdocument-cognition-service: Document analysis and processingsearch_service: Search functionality across documentsstatic_file_service: Static file serving
Processing Services
convert_service: Document format conversiondocument-text-extractor: Text extraction from documentssearch_processing_service: Search indexing and processing
Communication
comms_service: Internal communication handlingemail_service: Email processing and managementnotification_service: User notifications
Infrastructure
authentication_service: User authenticationconnection_gateway: WebSocket gatewaycontacts_service: Contact management
Data Storage
Macro uses a polyglot persistence approach with multiple specialized databases:PostgreSQL Databases
MacroDB - Main database containing:- Documents, users, and projects
- Communication data (messages, channels, participants)
- Email threads, messages, and metadata
- Notification preferences and history
- User connections and contacts
- Contact management
External Storage
Amazon S3
File storage for documents, email attachments, and static files across multiple buckets:
doc-storage: Document filesmacro-email-attachments: Email attachmentsstatic-file-storage: Static filesdocx-upload: DOCX processingbulk-upload-staging: Bulk uploads
Redis
- Session management
- Caching layer
- Real-time data
OpenSearch
- Full-text search indexing
- Document content search
- Search query processing
DynamoDB
- Connection tracking (
connection-gateway-table) - Bulk upload metadata (
bulk-upload) - Static file metadata (
static-file-metadata)
Service Communication
Services communicate through multiple patterns:HTTP APIs
- Internal service-to-service calls
- RESTful APIs for client communication
- Service client crates for type-safe communication
Asynchronous Processing
- SQS Queues for async task processing:
notification-queue: User notificationssearch-event-queue: Search indexing eventsemail-service-gmail-inbox-sync-queue: Gmail synchronizationconvert-service-queue: Document conversiondocument-text-extractor-lambda-queue: Text extraction
Event-Driven Processing
- AWS Lambda triggers for serverless processing
- S3 event notifications for file uploads
- Scheduled Lambda functions for background tasks
WebSocket Connections
- Real-time updates via
connection_gateway - Connection state tracked in DynamoDB
Document Processing Pipeline
Documents flow through a sophisticated processing pipeline:Text Extraction
- DOCX files unzipped via Lambda
- PDFs processed with pdfium library
- Text content extracted for indexing
Database Architecture
Each service maintains clean database boundaries:- Dedicated client crates:
macro_db_client,comms_db_client,contacts_db_client - SQLx for interactions: Compile-time query validation with offline mode
- Prisma schema: Located at
macro-api/database/schema.prisma - Migration management: Per-service migration handling
Schema Management
Database columns in the Prisma schema use camelCase, but must be cast to snake_case when reading:
AWS Integration
Heavy integration with AWS services:- S3: File and document storage
- Lambda: Serverless processing (text extraction, document conversion)
- SQS: Message queuing for async processing
- DynamoDB: Connection and metadata tracking
- OpenSearch: Search capabilities
Infrastructure as Code
All infrastructure is managed using Pulumi with TypeScript:- Deployment stacks in
infra/stacks/ - Lambda function configurations in
infra/lambda/ - Reusable AWS resource definitions in
infra/resources/
Development Architecture
Offline Development
The project usesSQLX_OFFLINE=true for building without live database connections:
- Database queries are pre-validated
- Query metadata cached in
.sqlx/directory - Enables CI/CD without database dependencies