Service Health Endpoints
Each service exposes a/health endpoint for liveness checks and uptime monitoring.
Health Check Reference
Automated Health Checks
Use the CLIdoctor command to check all services at once:
- Checks gateway reachability
- Validates all service health endpoints
- Reports connectivity issues
- Returns non-zero exit code on failure (suitable for CI/CD)
Protocol Adapter Health
Protocol adapters expose health check endpoints to verify configuration and on-chain program deployment.Check All Protocols
Check Specific Protocol
The escrow adapter health check verifies:
ESCROW_PROGRAM_IDis set in environment- Program account exists on-chain
- Program is executable
Audit Events
Theaudit-observability service collects and stores structured audit events for all critical operations.
Event Types
transaction.createdtransaction.simulatedtransaction.policy_evaluatedtransaction.signedtransaction.submittedtransaction.confirmedtransaction.failedwallet.createdagent.createdagent.startedagent.stoppedagent.pausedpolicy.evaluated
Query Audit Events
Create Audit Event
Post custom audit events:Transaction Proofs
Every transaction generates a cryptographic proof artifact for auditability and replay.Proof Structure
Retrieve Transaction Proof
Replay Transaction
Retrieve the complete execution context for replay:Use the replay endpoint to debug failures, verify policy decisions, and reconstruct transaction execution flow.
Metrics
The metrics store tracks counters for system events and operations.Increment Metric
Get All Metrics
Built-in Metrics
The system automatically tracks:audit_event.<eventType>- Count of each audit event type- Transaction stage transitions
- Policy evaluation results
- RPC pool failover events
- Outbox retry attempts
RPC Pool Status
Monitor RPC endpoint health and failover behavior:Health Score Calculation
- Initial score:
1.0 - Success:
score += 0.06 - latencyPenalty - Failure:
score *= 0.7 - Minimum score:
0.05
- ≤ 200ms: no penalty
- 201-500ms: -0.04
- 501-1000ms: -0.08
-
1000ms: -0.12
Endpoints are automatically sorted by score (then by latency) for failover selection.
Outbox Queue Status
Monitor the durable outbox queue for pending and failed transactions:Outbox Job Lifecycle
- pending - Job queued, waiting for worker
- processing - Worker has claimed lease
- done - Successfully completed
- failed - Exceeded max retry attempts
Lease and Retry Semantics
Lease and Retry Semantics
Jobs in
processing state have an active lease. If a worker crashes, the lease expires and the job returns to pending.Configuration:TX_OUTBOX_LEASE_MS- Lease duration (default: 30000ms)TX_OUTBOX_MAX_ATTEMPTS- Max retries (default: 6)TX_OUTBOX_POLL_MS- Worker poll interval (default: 2000ms)