Skip to main content
The state repository is the metadata layer of Arius. It persists information about every file that has been archived: its hash, its properties, and the pointer file entries that represent it in the local file system. The state database is stored locally during an operation and then uploaded to Azure Blob Storage alongside the archived blobs.

The four types

The repository is implemented as four collaborating types, each with a single clearly bounded responsibility.
TypeKindResponsibility
IStateRepositoryInterfaceDomain-specific repository contract
StateRepositoryClassBusiness logic and domain operations
StateRepositoryDbContextFactoryClassDatabase infrastructure and lifecycle management
StateRepositoryDbContextClassEntity configuration and schema definition

Relationships

IStateRepository
  └── StateRepository
          └── uses ──▶ StateRepositoryDbContextFactory
                              └── creates ──▶ StateRepositoryDbContext

IStateRepository

IStateRepository defines the domain-specific contract for the repository. It exposes operations in terms of Arius domain concepts — hashes, binary properties, and pointer file entries — hiding all persistence details from callers. Handlers depend on IStateRepository, not on any concrete class, which allows the repository to be replaced with a test double in unit tests.

StateRepository

new StateRepository(StateRepositoryDbContextFactory factory)
StateRepository is the concrete implementation of IStateRepository. It focuses entirely on business logic and domain-specific data access patterns. All database infrastructure concerns — connection management, migrations, lifecycle — are delegated to the factory.

Key operations

OperationDescription
GetBinaryProperty(hash)Retrieve stored binary properties for a given hash. Returns null if not yet uploaded.
UpsertPointerFileEntries(entries)Insert or update pointer file entry records for a set of files.
VacuumDelegate to the factory to run a SQLite VACUUM, compacting the database file.
DeleteDelegate to the factory to delete the database file entirely.

StateRepositoryDbContextFactory

new StateRepositoryDbContextFactory(stateDatabaseFile, ensureCreated, logger)
The factory centralises all EF Core and SQLite infrastructure concerns:
  • DbContext creation — constructs StateRepositoryDbContext instances with correct options.
  • Database lifecycle — manages Vacuum and Delete operations on the SQLite file.
  • Connection pool management — controls how SQLite connections are opened and closed.
  • Change tracking — maintains a flag (surfaced through the OnChanges callback) that records whether any write operation has occurred since the factory was created.

StateRepositoryDbContext

new StateRepositoryDbContext(DbContextOptions options, Action onChanges)
StateRepositoryDbContext is the EF Core DbContext. It owns:
  • Entity configuration — defines how domain entities map to SQLite tables and columns.
  • Value converters — converts domain value objects (like Hash) to and from their database representations.
  • Schema definition — applies column constraints, indexes, and relationships.
  • Change notification — calls the onChanges callback whenever SaveChanges or SaveChangesAsync is invoked with actual modifications, propagating the signal up to the factory.

Separation of concerns

The three-class split is intentional:
Why not put everything in one class?Mixing database infrastructure (connection strings, migrations, vacuuming) with business logic (what data to read and write) makes both harder to test and harder to change. By splitting responsibilities across StateRepository, StateRepositoryDbContextFactory, and StateRepositoryDbContext, each class can evolve independently. A change to the EF Core configuration does not touch the business logic, and a change to a repository query does not touch connection management.
ConcernOwner
Domain queries and commandsStateRepository
EF Core and SQLite infrastructureStateRepositoryDbContextFactory
Table schema and entity mappingStateRepositoryDbContext

Change tracking via OnChanges

The OnChanges callback threads through all three types:
1

Factory registers the callback

When StateRepositoryDbContextFactory is constructed it records an onChanges delegate and initialises a hasChanges flag to false.
2

Factory passes callback to DbContext

Every time the factory creates a StateRepositoryDbContext it passes the same onChanges delegate to the context constructor.
3

DbContext fires the callback on save

After SaveChanges or SaveChangesAsync completes with one or more affected rows, the context calls onChanges().
4

Factory records the change

The factory’s onChanges implementation sets hasChanges = true.
5

Orchestrator checks HasChanges

After all pipeline tasks complete, the archive command handler checks whether the state repository has changes. If it does, the database is vacuumed and re-uploaded to blob storage.
This mechanism ensures the state file in blob storage is only overwritten when necessary, avoiding spurious writes when an archive run finds nothing new to upload.

State file lifecycle

Archive command start


HandlerContextBuilder downloads state DB from blob storage (if it exists)


Pipeline runs — StateRepository records hashes, binary properties, pointer entries


Orchestrator checks HasChanges
  ├── Changes exist  ──▶ Vacuum DB  ──▶ Upload DB to blob storage
  └── No changes    ──▶ Delete local DB file
The state database file name is derived from the container name, so each Azure Blob Storage container has its own independent state repository.

Build docs developers (and LLMs) love