System Architecture

Architecture Overview

Macro is built as a Rust-based cloud storage microservices architecture using a Cargo workspace with 80+ crates. The system handles document storage, processing, search, communication, and email functionality with a SolidJS frontend and Pulumi-managed AWS infrastructure.

Core Services

The backend is organized into several categories of microservices:

Document Storage

document-storage-service: Main document storage API
document-cognition-service: Document analysis and processing
search_service: Search functionality across documents
static_file_service: Static file serving

Processing Services

convert_service: Document format conversion
document-text-extractor: Text extraction from documents
search_processing_service: Search indexing and processing

Communication

comms_service: Internal communication handling
email_service: Email processing and management
notification_service: User notifications

Infrastructure

authentication_service: User authentication
connection_gateway: WebSocket gateway
contacts_service: Contact management

Data Storage

Macro uses a polyglot persistence approach with multiple specialized databases:

PostgreSQL Databases

MacroDB - Main database containing:

Documents, users, and projects
Communication data (messages, channels, participants)
Email threads, messages, and metadata
Notification preferences and history

ContactsDB - Dedicated database for:

User connections and contacts
Contact management

External Storage

Amazon S3

File storage for documents, email attachments, and static files across multiple buckets:

doc-storage: Document files
macro-email-attachments: Email attachments
static-file-storage: Static files
docx-upload: DOCX processing
bulk-upload-staging: Bulk uploads

Redis

Session management
Caching layer
Real-time data

OpenSearch

Full-text search indexing
Document content search
Search query processing

DynamoDB

Connection tracking (connection-gateway-table)
Bulk upload metadata (bulk-upload)
Static file metadata (static-file-metadata)

Service Communication

Services communicate through multiple patterns:

HTTP APIs

Internal service-to-service calls
RESTful APIs for client communication
Service client crates for type-safe communication

Asynchronous Processing

SQS Queues for async task processing:
- notification-queue: User notifications
- search-event-queue: Search indexing events
- email-service-gmail-inbox-sync-queue: Gmail synchronization
- convert-service-queue: Document conversion
- document-text-extractor-lambda-queue: Text extraction

Event-Driven Processing

AWS Lambda triggers for serverless processing
S3 event notifications for file uploads
Scheduled Lambda functions for background tasks

WebSocket Connections

Real-time updates via connection_gateway
Connection state tracked in DynamoDB

Document Processing Pipeline

Documents flow through a sophisticated processing pipeline:

Upload

Files uploaded to S3 buckets, triggering event notifications

Text Extraction

DOCX files unzipped via Lambda
PDFs processed with pdfium library
Text content extracted for indexing

Search Indexing

Extracted text indexed in OpenSearch via search_processing_service

Storage

File content in S3
Metadata stored in MacroDB PostgreSQL
Search index in OpenSearch

Retrieval

Documents served via document-storage-service and static_file_service

Database Architecture

Each service maintains clean database boundaries:

Dedicated client crates: macro_db_client, comms_db_client, contacts_db_client
SQLx for interactions: Compile-time query validation with offline mode
Prisma schema: Located at macro-api/database/schema.prisma
Migration management: Per-service migration handling

Schema Management

Database columns in the Prisma schema use camelCase, but must be cast to snake_case when reading:

SELECT "userId" as "user_id" FROM "UserInsights"

AWS Integration

Heavy integration with AWS services:

S3: File and document storage
Lambda: Serverless processing (text extraction, document conversion)
SQS: Message queuing for async processing
DynamoDB: Connection and metadata tracking
OpenSearch: Search capabilities

Infrastructure as Code

All infrastructure is managed using Pulumi with TypeScript:

Deployment stacks in infra/stacks/
Lambda function configurations in infra/lambda/
Reusable AWS resource definitions in infra/resources/

Development Architecture

Offline Development

The project uses SQLX_OFFLINE=true for building without live database connections:

Database queries are pre-validated
Query metadata cached in .sqlx/ directory
Enables CI/CD without database dependencies

Testing Architecture

Test each service independently:

cargo test -p {service_name}

Tests use fixtures and require database setup for integration testing.

Overview

Frontend

Backend

Infrastructure

System Architecture

Architecture Overview

Core Services

Document Storage

Processing Services

Communication

Infrastructure

Data Storage

PostgreSQL Databases

External Storage

Amazon S3

Redis

OpenSearch

DynamoDB

Service Communication

HTTP APIs

Asynchronous Processing

Event-Driven Processing

WebSocket Connections

Document Processing Pipeline

Database Architecture

Schema Management

AWS Integration

Infrastructure as Code

Development Architecture

Offline Development

Testing Architecture

Build docs developers (and LLMs) love

Overview

Frontend

Backend

Infrastructure

​Architecture Overview

​Core Services

Document Storage

Processing Services

Communication

Infrastructure

​Data Storage

​PostgreSQL Databases

​External Storage

Amazon S3

Redis

OpenSearch

DynamoDB

​Service Communication

​HTTP APIs

​Asynchronous Processing

​Event-Driven Processing

​WebSocket Connections

​Document Processing Pipeline

​Database Architecture

​Schema Management

​AWS Integration

​Infrastructure as Code

​Development Architecture

​Offline Development

​Testing Architecture

Build docs developers (and LLMs) love

Architecture Overview

Core Services

Data Storage

PostgreSQL Databases

External Storage

Service Communication

HTTP APIs

Asynchronous Processing

Event-Driven Processing

WebSocket Connections

Document Processing Pipeline

Database Architecture

Schema Management

AWS Integration

Infrastructure as Code

Development Architecture

Offline Development

Testing Architecture