Skip to main content

Architecture Overview

Macro is built as a Rust-based cloud storage microservices architecture using a Cargo workspace with 80+ crates. The system handles document storage, processing, search, communication, and email functionality with a SolidJS frontend and Pulumi-managed AWS infrastructure.

Core Services

The backend is organized into several categories of microservices:

Document Storage

  • document-storage-service: Main document storage API
  • document-cognition-service: Document analysis and processing
  • search_service: Search functionality across documents
  • static_file_service: Static file serving

Processing Services

  • convert_service: Document format conversion
  • document-text-extractor: Text extraction from documents
  • search_processing_service: Search indexing and processing

Communication

  • comms_service: Internal communication handling
  • email_service: Email processing and management
  • notification_service: User notifications

Infrastructure

  • authentication_service: User authentication
  • connection_gateway: WebSocket gateway
  • contacts_service: Contact management

Data Storage

Macro uses a polyglot persistence approach with multiple specialized databases:

PostgreSQL Databases

MacroDB - Main database containing:
  • Documents, users, and projects
  • Communication data (messages, channels, participants)
  • Email threads, messages, and metadata
  • Notification preferences and history
ContactsDB - Dedicated database for:
  • User connections and contacts
  • Contact management

External Storage

Amazon S3

File storage for documents, email attachments, and static files across multiple buckets:
  • doc-storage: Document files
  • macro-email-attachments: Email attachments
  • static-file-storage: Static files
  • docx-upload: DOCX processing
  • bulk-upload-staging: Bulk uploads

Redis

  • Session management
  • Caching layer
  • Real-time data

OpenSearch

  • Full-text search indexing
  • Document content search
  • Search query processing

DynamoDB

  • Connection tracking (connection-gateway-table)
  • Bulk upload metadata (bulk-upload)
  • Static file metadata (static-file-metadata)

Service Communication

Services communicate through multiple patterns:

HTTP APIs

  • Internal service-to-service calls
  • RESTful APIs for client communication
  • Service client crates for type-safe communication

Asynchronous Processing

  • SQS Queues for async task processing:
    • notification-queue: User notifications
    • search-event-queue: Search indexing events
    • email-service-gmail-inbox-sync-queue: Gmail synchronization
    • convert-service-queue: Document conversion
    • document-text-extractor-lambda-queue: Text extraction

Event-Driven Processing

  • AWS Lambda triggers for serverless processing
  • S3 event notifications for file uploads
  • Scheduled Lambda functions for background tasks

WebSocket Connections

  • Real-time updates via connection_gateway
  • Connection state tracked in DynamoDB

Document Processing Pipeline

Documents flow through a sophisticated processing pipeline:
1

Upload

Files uploaded to S3 buckets, triggering event notifications
2

Text Extraction

  • DOCX files unzipped via Lambda
  • PDFs processed with pdfium library
  • Text content extracted for indexing
3

Search Indexing

Extracted text indexed in OpenSearch via search_processing_service
4

Storage

  • File content in S3
  • Metadata stored in MacroDB PostgreSQL
  • Search index in OpenSearch
5

Retrieval

Documents served via document-storage-service and static_file_service

Database Architecture

Each service maintains clean database boundaries:
  • Dedicated client crates: macro_db_client, comms_db_client, contacts_db_client
  • SQLx for interactions: Compile-time query validation with offline mode
  • Prisma schema: Located at macro-api/database/schema.prisma
  • Migration management: Per-service migration handling

Schema Management

Database columns in the Prisma schema use camelCase, but must be cast to snake_case when reading:
SELECT "userId" as "user_id" FROM "UserInsights"

AWS Integration

Heavy integration with AWS services:
  • S3: File and document storage
  • Lambda: Serverless processing (text extraction, document conversion)
  • SQS: Message queuing for async processing
  • DynamoDB: Connection and metadata tracking
  • OpenSearch: Search capabilities

Infrastructure as Code

All infrastructure is managed using Pulumi with TypeScript:
  • Deployment stacks in infra/stacks/
  • Lambda function configurations in infra/lambda/
  • Reusable AWS resource definitions in infra/resources/

Development Architecture

Offline Development

The project uses SQLX_OFFLINE=true for building without live database connections:
  • Database queries are pre-validated
  • Query metadata cached in .sqlx/ directory
  • Enables CI/CD without database dependencies

Testing Architecture

Test each service independently:
cargo test -p {service_name}
Tests use fixtures and require database setup for integration testing.

Build docs developers (and LLMs) love