This guide provides solutions to common operational issues encountered when running Lamassu IoT in production environments.
Database Issues
Connection Pool Exhausted
Symptoms:
HTTP 500 errors from services
Logs showing “connection pool exhausted” or “too many clients”
Slow API response times
Diagnosis:
# Check PostgreSQL connection count
psql -h localhost -U postgres -c \
"SELECT count(*) FROM pg_stat_activity;"
# Check max connections
psql -h localhost -U postgres -c \
"SHOW max_connections;"
# Identify connections by application
psql -h localhost -U postgres -c \
"SELECT application_name, count(*) FROM pg_stat_activity
GROUP BY application_name;"
Solutions:
Increase PostgreSQL max_connections:
# postgresql.conf
max_connections = 200 # Default is often 100
# Restart PostgreSQL
systemctl restart postgresql
Configure connection pooling in services:
postgres :
max_open_connections : 25
max_idle_connections : 5
connection_max_lifetime_minutes : 10
Use PgBouncer for connection pooling:
# /etc/pgbouncer/pgbouncer.ini
[databases]
lamassu = host =localhost port =5432 dbname =lamassu
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
Slow Queries
Symptoms:
High API latency
Database CPU usage at 100%
Long-running queries in pg_stat_activity
Diagnosis:
-- Find slow running queries
SELECT pid, now () - pg_stat_activity . query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC ;
-- Check for missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname NOT IN ( 'pg_catalog' , 'information_schema' )
AND n_distinct > 100
ORDER BY abs (correlation) ASC ;
-- Analyze query plan
EXPLAIN ANALYZE
SELECT * FROM certificates WHERE status = 'ACTIVE' AND expiration < NOW ();
Solutions:
Add missing indexes:
-- Index on frequently queried columns
CREATE INDEX idx_certificates_status ON certificates( status );
CREATE INDEX idx_certificates_expiration ON certificates(expiration);
CREATE INDEX idx_devices_dms_id ON devices(dms_id);
Update table statistics:
ANALYZE certificates;
ANALYZE devices;
ANALYZE cas;
Optimize configuration for workload:
# postgresql.conf
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
work_mem = 64MB # Per-operation memory
maintenance_work_mem = 1GB # For VACUUM, indexes
random_page_cost = 1.1 # For SSD storage
Database Migration Failures
Symptoms:
Service fails to start
Logs showing “migration failed” or “schema version mismatch”
Diagnosis:
-- Check current schema version
SELECT * FROM goose_db_version ORDER BY version_id DESC LIMIT 5 ;
-- Check for failed migrations
SELECT * FROM goose_db_version WHERE is_applied = false;
Solutions:
Manually run migration:
# Using goose-lamassu tool
goose-lamassu -dir ./engines/storage/postgres/migrations/ca \
postgres "host=localhost user=postgres dbname=ca sslmode=disable" up
Fix failed migration and retry:
-- Mark migration as not applied to retry
DELETE FROM goose_db_version WHERE version_id = 20250309120000 ;
Restore from backup if corruption occurred:
pg_restore -d lamassu /backup/lamassu_latest.dump
Always backup your database before attempting manual migration fixes.
Certificate Issuance Issues
CA Not Found
Symptoms:
HTTP 404 when signing certificates
Error: “CA with id ‘xxx’ not found”
Diagnosis:
# List all CAs
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/ca/v1/cas | jq
# Get specific CA
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/ca/v1/cas/{ca-id} | jq
Solutions:
Verify CA exists in database:
SELECT id, subject_common_name, status FROM cas WHERE id = 'your-ca-id' ;
Check CA status:
# Ensure CA is in ACTIVE status, not EXPIRED or REVOKED
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/ca/v1/cas/{ca-id} | jq '.status'
Recreate CA if missing:
curl -X POST https://lamassu.example.com/api/ca/v1/cas \
-H "Authorization: Bearer $TOKEN " \
-H "Content-Type: application/json" \
-d '{
"id": "replacement-ca",
"type": "MANAGED",
"subject": {
"common_name": "Replacement CA",
"organization": "YourOrg"
},
"engine_id": "vault-engine",
"key_metadata": {"type": "RSA", "bits": 4096}
}'
Crypto Engine Failures
Symptoms:
Certificate signing fails with crypto errors
Timeouts during CA operations
Errors mentioning PKCS#11, Vault, or AWS KMS
PKCS#11 HSM Issues:
# Test PKCS#11 module
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so --list-slots
# Check HSM connectivity
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
--slot 0 --login --pin 1234 --list-objects
# Verify token PIN
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
--slot 0 --login --pin 1234 --test
Common PKCS#11 fixes:
Incorrect PIN: Update crypto engine configuration
Token not initialized: Initialize token with pkcs11-tool
HSM disconnected: Check network/USB connection
Session limit reached: Restart HSM or service
HashiCorp Vault Issues:
# Check Vault status
vault status
# Test authentication
vault login -method=approle role_id= $ROLE_ID secret_id= $SECRET_ID
# List secrets
vault kv list lamassu-pki/
# Check Vault logs
journalctl -u vault -n 100
Common Vault fixes:
Vault sealed:
vault operator unseal
# Or enable auto-unseal with cloud KMS
Token expired:
# Generate new AppRole credentials
vault write -f auth/approle/role/lamassu/secret-id
# Update service configuration
Permission denied:
# Verify policy allows CA operations
vault policy read lamassu-ca
# Update policy if needed
vault policy write lamassu-ca - << EOF
path "lamassu-pki/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
EOF
AWS KMS Issues:
# Test KMS access
aws kms describe-key --key-id alias/lamassu-ca
# Test encryption/decryption
echo "test" | base64 > /tmp/plaintext.txt
aws kms encrypt \
--key-id alias/lamassu-ca \
--plaintext fileb:///tmp/plaintext.txt \
--query CiphertextBlob \
--output text | base64 -d > /tmp/encrypted.bin
# Check IAM permissions
aws iam get-user
aws iam list-attached-user-policies --user-name lamassu-service
Common AWS KMS fixes:
Insufficient permissions:
{
"Version" : "2012-10-17" ,
"Statement" : [{
"Effect" : "Allow" ,
"Action" : [
"kms:Decrypt" ,
"kms:Encrypt" ,
"kms:GenerateDataKey" ,
"kms:DescribeKey" ,
"kms:CreateAlias" ,
"kms:Sign" ,
"kms:Verify"
],
"Resource" : "arn:aws:kms:us-east-1:123456789:key/*"
}]
}
Region mismatch:
# Ensure crypto engine config matches KMS key region
crypto_engines :
aws_kms :
- id : "aws-kms"
region : "us-east-1" # Must match key region
EST Enrollment Issues
400 Bad Request
Symptoms:
EST enrollment fails with HTTP 400
Error: “Invalid request body” or “Malformed CSR”
Diagnosis:
# Verify CSR format
base64 -d device.b64 | openssl req -inform DER -text -noout
# Check for newlines in base64 (common issue)
cat device.b64 | wc -l
# Should output: 1 (single line)
Solutions:
Ensure base64 has no newlines:
# Correct: single-line base64
openssl req -in device.csr -outform DER | base64 -w 0 > device.b64
# Wrong: multi-line base64
openssl req -in device.csr -outform DER | base64 > device.b64
Verify Content-Type header:
curl -v -H "Content-Type: application/pkcs10" \
--data-binary "@device.b64" \
"https://est.example.com/.well-known/est/dms-01/simpleenroll"
Validate CSR before sending:
# Check CSR is valid DER format
base64 -d device.b64 > device.der
openssl req -inform DER -in device.der -text -noout
401 Unauthorized
Symptoms:
EST enrollment rejected
Error: “Client certificate not trusted” or “Authentication failed”
Diagnosis:
# Test TLS handshake
openssl s_client -connect est.example.com:443 \
-cert bootstrap.crt -key bootstrap.key -showcerts
# Verify client certificate chain
openssl verify -CAfile ca-bundle.pem bootstrap.crt
# Check certificate issuer
openssl x509 -in bootstrap.crt -noout -issuer
Solutions:
Verify DMS validation CA list:
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/dmsmanager/v1/dms/{dms-id} | \
jq '.settings.enrollment_settings.est_rfc7030_settings.authentication.client_certificate.validation_cas'
Add bootstrap CA to validation list:
curl -X PATCH https://lamassu.example.com/api/dmsmanager/v1/dms/{dms-id} \
-H "Authorization: Bearer $TOKEN " \
-H "Content-Type: application/json-patch+json" \
-d '[{
"op": "add",
"path": "/settings/enrollment_settings/est_rfc7030_settings/authentication/client_certificate/validation_cas/-",
"value": "bootstrap-ca-id"
}]'
Check certificate expiration:
openssl x509 -in bootstrap.crt -noout -dates
404 Not Found
Symptoms:
EST endpoint returns 404
Error: “DMS not found”
Diagnosis:
# Verify DMS exists
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/dmsmanager/v1/dms | jq '.dms[].id'
# Check DMS status
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/dmsmanager/v1/dms/{dms-id}
Solutions:
Verify correct DMS ID in URL:
# Correct format
https://est.example.com/.well-known/est/ {dms-id} /simpleenroll
# Check available DMS instances
curl -H "Authorization: Bearer $TOKEN " \
https://lamassu.example.com/api/dmsmanager/v1/dms
Create missing DMS:
curl -X POST https://lamassu.example.com/api/dmsmanager/v1/dms \
-H "Authorization: Bearer $TOKEN " \
-d '{
"id": "production-dms",
"name": "Production DMS",
"settings": {
"enrollment_settings": {
"protocol": "EST",
"device_provisioning_profile_id": "iot-profile"
}
}
}'
Service Startup Failures
Service Won’t Start
Symptoms:
Systemd service fails to start
Service crashes immediately after launch
Diagnosis:
# Check service status
systemctl status lamassu-ca
# View recent logs
journalctl -u lamassu-ca -n 100 --no-pager
# Check for port conflicts
sudo netstat -tlnp | grep :8080
# Verify configuration file syntax
cat /etc/lamassu/ca-config.yaml | yq eval
Common issues:
Port already in use:
# Find process using port
sudo lsof -i :8080
# Change port in configuration
# /etc/lamassu/ca-config.yaml
http:
port: 8081
Database connection failure:
# Test database connectivity
psql -h localhost -U postgres -d lamassu -c "SELECT 1;"
# Check database credentials in config
cat /etc/lamassu/ca-config.yaml | grep -A 5 postgres
Missing environment variables:
# Check service environment
systemctl show lamassu-ca | grep Environment
# Set required variables in systemd unit
# /etc/systemd/system/lamassu-ca.service
[Service]
Environment = "VAULT_TOKEN=s.xxxxx"
Environment = "DB_PASSWORD=secret"
File permissions:
# Check config file ownership
ls -l /etc/lamassu/ca-config.yaml
# Fix permissions
sudo chown lamassu:lamassu /etc/lamassu/ca-config.yaml
sudo chmod 640 /etc/lamassu/ca-config.yaml
Memory Issues
Symptoms:
Service OOM (out of memory) killed
Logs showing “cannot allocate memory”
Diagnosis:
# Check memory usage
free -h
# Monitor process memory
top -p $( pgrep lamassu-ca )
# Check OOM killer logs
dmesg | grep -i oom
journalctl -k | grep -i oom
Solutions:
Increase container/VM memory:
# Kubernetes
resources :
limits :
memory : 2Gi
requests :
memory : 1Gi
Tune Go garbage collector:
# Increase GC target percentage (default 100)
export GOGC = 200
# Set memory limit
export GOMEMLIMIT = 1800MiB # Leave headroom
Reduce database connection pool:
postgres :
max_open_connections : 10 # Reduce from default 25
Monitoring and Observability Issues
Metrics Not Appearing
Symptoms:
Grafana shows no data
OTLP exporter errors in logs
Diagnosis:
# Test OTLP collector connectivity
curl http://otel-collector:4318/v1/metrics
# Check service OTEL configuration
cat /etc/lamassu/ca-config.yaml | grep -A 10 otel
# Verify collector is receiving data
curl http://otel-collector:8888/metrics | grep lamassu
Solutions:
Enable OTEL in service config:
otel :
metrics :
enabled : true
hostname : "otel-collector"
port : 4318
scheme : "http"
Check OTLP collector configuration:
# otel-collector-config.yaml
receivers :
otlp :
protocols :
http :
endpoint : 0.0.0.0:4318
exporters :
prometheus :
endpoint : 0.0.0.0:9090
service :
pipelines :
metrics :
receivers : [ otlp ]
exporters : [ prometheus ]
Verify network connectivity:
# From Lamassu service container
telnet otel-collector 4318
# Check DNS resolution
nslookup otel-collector
Traces Missing Context
Symptoms:
Distributed traces show disconnected spans
No parent-child relationships in traces
Solutions:
Enable trace propagation:
otel :
traces :
enabled : true
hostname : "otel-collector"
port : 4318
Verify HTTP instrumentation:
// Services use otelhttp for automatic propagation
import " go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp "
Check propagation headers:
curl -v https://lamassu.example.com/api/ca/v1/cas \
-H "traceparent: 00-<trace-id>-<span-id>-01"
Slow API Responses
Diagnosis checklist:
Check HTTP Metrics
histogram_quantile(0.95,
rate(http_server_duration_bucket[5m])
) by (http_route)
Analyze Distributed Traces
Find slow spans in Tempo/Jaeger to identify bottleneck (DB, crypto, network)
Check Database Performance
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10 ;
Monitor Crypto Engine Latency
histogram_quantile(0.95,
rate(crypto_operation_duration_seconds_bucket[5m])
) by (engine_id)
Common fixes:
Add database indexes for frequently queried fields
Increase database shared_buffers and work_mem
Scale HSM/Vault infrastructure if crypto operations are slow
Add caching layer for frequently accessed CAs
Horizontal scaling of Lamassu services
Getting Help
If you’re unable to resolve an issue:
Check Logs Review service logs with journalctl or your log aggregation system. Set log level to debug temporarily.
When reporting issues, include:
Lamassu version (git describe --tags)
Deployment method (Docker, Kubernetes, monolithic)
Relevant configuration (redact secrets)
Complete error messages and stack traces
Steps to reproduce the issue
Logs from affected services