Architecture

Syft Space is built as a modern full-stack application using FastAPI, Vue 3, and a domain-driven design architecture.

System overview

Core components

Frontend (Vue 3)

The frontend is a single-page application built with:

Vue 3 with Composition API and TypeScript
Tailwind CSS for styling
Pinia for state management
Vue Router for navigation
Axios for API communication

Structure:

frontend/
├── src/
│   ├── pages/          # Page components
│   ├── components/     # Reusable UI components
│   ├── composables/    # Shared business logic
│   ├── stores/         # Pinia state stores
│   └── api/            # API client and types

Backend (FastAPI)

The backend uses FastAPI with a domain-driven design pattern: Component structure:

backend/syft_space/components/
├── datasets/
│   ├── entities.py         # Database models
│   ├── schemas.py          # Request/response models
│   ├── repository.py       # Data access layer
│   ├── handlers.py         # Business logic
│   └── routes.py           # API endpoints
├── models/
├── endpoints/
├── policies/
└── shared/                 # Common utilities

Key technologies:

FastAPI - Web framework
SQLModel - ORM with Pydantic integration
Alembic - Database migrations
Loguru - Structured logging
Pydantic - Data validation

Type registry pattern

Syft Space uses a plugin-style registry pattern for extensibility:

# Register built-in types at startup
register_dataset_types()
register_model_types()
register_policy_types()

# Types are registered in global registries
DATASET_TYPE_REGISTRY
MODEL_TYPE_REGISTRY
POLICY_TYPE_REGISTRY

This allows:

Adding new dataset types (e.g., Pinecone, Milvus)
Adding new model providers (e.g., Cohere, Gemini)
Adding new policy types (e.g., quota, throttling)

Multi-tenancy

Syft Space implements tenant isolation: Tenant middleware (tenants/middleware.py):

Extracts tenant from JWT token or X-Tenant-Name header
Injects tenant context into all requests
Ensures data isolation between tenants

Data isolation:

class Dataset(BaseEntity):
    tenant_name: str  # Every entity belongs to a tenant

All queries are automatically scoped to the current tenant.

Provisioning system

Automatic Docker provisioning for vector databases: Provisioner manager manages:

Container lifecycle (start/stop/cleanup)
Port allocation
Volume management
Health monitoring
State persistence

Authentication & authorization

Authentication

Two auth modes:

Local auth - Bearer token from login
SyftHub auth - Satellite token from marketplace

Auth middleware (auth/middleware.py):

bearer_scheme = HTTPBearer()

async def get_verified_user_email(token: str) -> str:
    # Verify JWT and extract user email

Authorization

Policy-based authorization:

Access policies control who can query endpoints
Rate limit policies control query frequency
Accounting policies track usage and costs

Database architecture

SQLite with async support:

class AsyncDatabase:
    def __init__(self, config: SQLiteConfig):
        self.engine = create_async_engine(
            config.database_url,
            connect_args={"check_same_thread": False}
        )

Migrations managed by Alembic:

alembic upgrade head  # Apply migrations
alembic revision --autogenerate -m "description"

Lifecycle management

Components implement LifecycleService protocol:

class LifecycleService(Protocol):
    async def startup(self) -> None: ...
    async def shutdown(self) -> None: ...

Startup sequence:

Initialize database
Register type registries
Start provisioner manager
Start ingestion manager
Start heartbeat manager
Sync marketplace state

Shutdown sequence:

Stop background tasks
Cleanup provisioned resources
Close database connections

Background services

Ingestion manager

Processes file uploads asynchronously:

Queue-based task processing
File watching for auto-ingestion
Chunking and embedding generation
Progress tracking

Heartbeat manager

Monitors system health:

Periodic health checks
Marketplace status sync
Endpoint availability monitoring

Proxy service

Manages ngrok tunnels:

Automatic tunnel creation
Public URL management
Connection monitoring

API versioning

All endpoints are versioned:

api_router = APIRouter(prefix="/api/v1")

Future versions can coexist:

/api/v1/endpoints/
/api/v2/endpoints/ (future)

Error handling

Consistent error responses:

class AppException(Exception):
    status_code: int
    detail: str

raise DatasetNotFoundError(name="my-dataset")
# Returns: {"detail": "Dataset 'my-dataset' not found"}

Performance considerations

Async/await throughout for concurrency
Connection pooling for database
Caching for type registries and schemas
Batch operations for policy evaluation
Streaming responses for large queries

Security architecture

JWT tokens for authentication
Tenant isolation at data layer
Input validation with Pydantic
SQL injection protection via SQLModel
CORS configured for trusted origins
Rate limiting via policies

Extensibility points

Dataset types - Add new data sources
Model types - Add new AI providers
Policy types - Add new access controls
Middlewares - Add custom request processing
Background services - Add scheduled tasks

See Custom integrations for implementation guides.

Get Started

Core Concepts

Guides

Desktop App

Deployment

Advanced

System overview

Core components

Frontend (Vue 3)

Backend (FastAPI)

Type registry pattern

Multi-tenancy

Provisioning system

Authentication & authorization

Authentication

Authorization

Database architecture

Lifecycle management

Background services

Ingestion manager

Heartbeat manager

Proxy service

API versioning

Error handling

Performance considerations

Security architecture

Extensibility points

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Desktop App

Deployment

Advanced

​System overview

​Core components

​Frontend (Vue 3)

​Backend (FastAPI)

​Type registry pattern

​Multi-tenancy

​Provisioning system

​Authentication & authorization

​Authentication

​Authorization

​Database architecture

​Lifecycle management

​Background services

​Ingestion manager

​Heartbeat manager

​Proxy service

​API versioning

​Error handling

​Performance considerations

​Security architecture

​Extensibility points

Build docs developers (and LLMs) love

System overview

Core components

Frontend (Vue 3)

Backend (FastAPI)

Type registry pattern

Multi-tenancy

Provisioning system

Authentication & authorization

Authentication

Authorization

Database architecture

Lifecycle management

Background services

Ingestion manager

Heartbeat manager

Proxy service

API versioning

Error handling

Performance considerations

Security architecture

Extensibility points