State repository

The state repository is the metadata layer of Arius. It persists information about every file that has been archived: its hash, its properties, and the pointer file entries that represent it in the local file system. The state database is stored locally during an operation and then uploaded to Azure Blob Storage alongside the archived blobs.

The four types

The repository is implemented as four collaborating types, each with a single clearly bounded responsibility.

Type	Kind	Responsibility
`IStateRepository`	Interface	Domain-specific repository contract
`StateRepository`	Class	Business logic and domain operations
`StateRepositoryDbContextFactory`	Class	Database infrastructure and lifecycle management
`StateRepositoryDbContext`	Class	Entity configuration and schema definition

Relationships

IStateRepository
  └── StateRepository
          └── uses ──▶ StateRepositoryDbContextFactory
                              └── creates ──▶ StateRepositoryDbContext

IStateRepository

IStateRepository defines the domain-specific contract for the repository. It exposes operations in terms of Arius domain concepts — hashes, binary properties, and pointer file entries — hiding all persistence details from callers. Handlers depend on IStateRepository, not on any concrete class, which allows the repository to be replaced with a test double in unit tests.

StateRepository

new StateRepository(StateRepositoryDbContextFactory factory)

StateRepository is the concrete implementation of IStateRepository. It focuses entirely on business logic and domain-specific data access patterns. All database infrastructure concerns — connection management, migrations, lifecycle — are delegated to the factory.

Key operations

Operation	Description
`GetBinaryProperty(hash)`	Retrieve stored binary properties for a given hash. Returns null if not yet uploaded.
`UpsertPointerFileEntries(entries)`	Insert or update pointer file entry records for a set of files.
`Vacuum`	Delegate to the factory to run a SQLite `VACUUM`, compacting the database file.
`Delete`	Delegate to the factory to delete the database file entirely.

StateRepositoryDbContextFactory

new StateRepositoryDbContextFactory(stateDatabaseFile, ensureCreated, logger)

The factory centralises all EF Core and SQLite infrastructure concerns:

DbContext creation — constructs StateRepositoryDbContext instances with correct options.
Database lifecycle — manages Vacuum and Delete operations on the SQLite file.
Connection pool management — controls how SQLite connections are opened and closed.
Change tracking — maintains a flag (surfaced through the OnChanges callback) that records whether any write operation has occurred since the factory was created.

StateRepositoryDbContext

new StateRepositoryDbContext(DbContextOptions options, Action onChanges)

StateRepositoryDbContext is the EF Core DbContext. It owns:

Entity configuration — defines how domain entities map to SQLite tables and columns.
Value converters — converts domain value objects (like Hash) to and from their database representations.
Schema definition — applies column constraints, indexes, and relationships.
Change notification — calls the onChanges callback whenever SaveChanges or SaveChangesAsync is invoked with actual modifications, propagating the signal up to the factory.

Separation of concerns

The three-class split is intentional:

Why not put everything in one class?Mixing database infrastructure (connection strings, migrations, vacuuming) with business logic (what data to read and write) makes both harder to test and harder to change. By splitting responsibilities across StateRepository, StateRepositoryDbContextFactory, and StateRepositoryDbContext, each class can evolve independently. A change to the EF Core configuration does not touch the business logic, and a change to a repository query does not touch connection management.

Concern	Owner
Domain queries and commands	`StateRepository`
EF Core and SQLite infrastructure	`StateRepositoryDbContextFactory`
Table schema and entity mapping	`StateRepositoryDbContext`

Change tracking via OnChanges

The OnChanges callback threads through all three types:

Factory registers the callback

When StateRepositoryDbContextFactory is constructed it records an onChanges delegate and initialises a hasChanges flag to false.

Factory passes callback to DbContext

Every time the factory creates a StateRepositoryDbContext it passes the same onChanges delegate to the context constructor.

DbContext fires the callback on save

After SaveChanges or SaveChangesAsync completes with one or more affected rows, the context calls onChanges().

Factory records the change

The factory’s onChanges implementation sets hasChanges = true.

Orchestrator checks HasChanges

After all pipeline tasks complete, the archive command handler checks whether the state repository has changes. If it does, the database is vacuumed and re-uploaded to blob storage.

This mechanism ensures the state file in blob storage is only overwritten when necessary, avoiding spurious writes when an archive run finds nothing new to upload.

State file lifecycle

Archive command start
  │
  ▼
HandlerContextBuilder downloads state DB from blob storage (if it exists)
  │
  ▼
Pipeline runs — StateRepository records hashes, binary properties, pointer entries
  │
  ▼
Orchestrator checks HasChanges
  ├── Changes exist  ──▶ Vacuum DB  ──▶ Upload DB to blob storage
  └── No changes    ──▶ Delete local DB file

The state database file name is derived from the container name, so each Azure Blob Storage container has its own independent state repository.

Get Started

CLI Reference

Guides

Architecture

Reference

State repository

The four types

Relationships

IStateRepository

StateRepository

Key operations

StateRepositoryDbContextFactory

StateRepositoryDbContext

Separation of concerns

Change tracking via OnChanges

State file lifecycle

Build docs developers (and LLMs) love

Get Started

CLI Reference

Guides

Architecture

Reference

​The four types

​Relationships

​IStateRepository

​StateRepository

​Key operations

​StateRepositoryDbContextFactory

​StateRepositoryDbContext

​Separation of concerns

​Change tracking via OnChanges

​State file lifecycle

Build docs developers (and LLMs) love

The four types

Relationships

IStateRepository

StateRepository

Key operations

StateRepositoryDbContextFactory

StateRepositoryDbContext

Separation of concerns

Change tracking via OnChanges

State file lifecycle