Overview
Fluxer uses SigNoz as its observability platform, providing unified traces, metrics, and logs in a single interface. Built on OpenTelemetry standards, it offers:
Distributed tracing - Track requests across microservices
Metrics collection - Monitor performance and resource usage
Log aggregation - Centralized logging with structured search
Custom dashboards - Build visualizations for key metrics
Alerting - Proactive notifications for anomalies
SigNoz is self-hosted, giving you complete control over telemetry data without third-party dependencies.
Architecture
SigNoz Stack Components
OpenTelemetry Collector
Receives telemetry data via OTLP (gRPC and HTTP) and forwards to ClickHouse.
Ports : 4317 (gRPC), 4318 (HTTP)
Replicas : 3 for high availability
Batch processing : 10k events per batch
ClickHouse
High-performance columnar database for storing traces, metrics, and logs.
Version : 25.5.6
Retention : Configurable per data type
Compression : Optimized for telemetry data
SigNoz UI
Web interface for querying and visualizing telemetry data.
Port : 8080
URL : signoz.fluxer.app (behind Caddy)
Authentication : Built-in user management
Zookeeper
Coordination service for ClickHouse cluster management.
Version : 3.7.1
Replicas : 1 (increase for production)
Deployment
Docker Swarm Stack
Deploy SigNoz
compose.yaml (excerpt)
cd fluxer_devops/signoz
# Deploy with default settings
./deploy.sh
# Or specify version
export SIGNOZ_IMAGE_TAG = v0 . 108 . 0
export OTELCOL_TAG = v0 . 129 . 12
docker stack deploy -c compose.yaml fluxer-signoz
Environment Variables
SIGNOZ_IMAGE_TAG = v0.108.0
OTELCOL_TAG = v0.129.12
LOW_CARDINAL_EXCEPTION_GROUPING = false
OpenTelemetry Collector Configuration
The OTel Collector processes and exports telemetry data:
Receivers
Processors
Exporters
Pipelines
receivers :
otlp :
protocols :
grpc :
endpoint : 0.0.0.0:4317
http :
endpoint : 0.0.0.0:4318
prometheus :
config :
global :
scrape_interval : 60s
scrape_configs :
- job_name : otel-collector
static_configs :
- targets :
- localhost:8888
processors :
batch :
send_batch_size : 10000
send_batch_max_size : 11000
timeout : 10s
resourcedetection :
detectors : [ env , system ]
timeout : 2s
signozspanmetrics/delta :
metrics_exporter : signozclickhousemetrics
metrics_flush_interval : 60s
latency_histogram_buckets :
[ 100us , 1ms , 2ms , 6ms , 10ms , 50ms , 100ms , 250ms ,
500ms , 1000ms , 1400ms , 2000ms , 5s , 10s , 20s , 40s , 60s ]
exporters :
clickhousetraces :
datasource : tcp://clickhouse:9000/signoz_traces
use_new_schema : true
signozclickhousemetrics :
dsn : tcp://clickhouse:9000/signoz_metrics
clickhouselogsexporter :
dsn : tcp://clickhouse:9000/signoz_logs
timeout : 10s
use_new_schema : true
service :
pipelines :
traces :
receivers : [ otlp ]
processors : [ signozspanmetrics/delta , batch ]
exporters : [ clickhousetraces , signozmeter ]
metrics :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ signozclickhousemetrics , signozmeter ]
logs :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ clickhouselogsexporter , signozmeter ]
Instrumenting Fluxer Services
Node.js (fluxer_server, fluxer_api, fluxer_gateway)
Install OpenTelemetry packages:
pnpm add @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc \
@opentelemetry/exporter-metrics-otlp-grpc
Create instrumentation file:
import { NodeSDK } from '@opentelemetry/sdk-node' ;
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc' ;
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc' ;
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node' ;
import { Resource } from '@opentelemetry/resources' ;
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions' ;
const sdk = new NodeSDK ({
resource: new Resource ({
[SemanticResourceAttributes. SERVICE_NAME ]: 'fluxer-server' ,
[SemanticResourceAttributes. SERVICE_VERSION ]: process . env . VERSION || 'dev' ,
[SemanticResourceAttributes. DEPLOYMENT_ENVIRONMENT ]: process . env . NODE_ENV || 'production' ,
}),
traceExporter: new OTLPTraceExporter ({
url: 'grpc://otel-collector:4317' ,
}),
metricReader: new PeriodicExportingMetricReader ({
exporter: new OTLPMetricExporter ({
url: 'grpc://otel-collector:4317' ,
}),
}),
instrumentations: [ getNodeAutoInstrumentations ()],
});
sdk . start ();
Load before application:
{
"scripts" : {
"start" : "node --require ./instrumentation.js dist/index.js"
}
}
Custom Spans
Add manual instrumentation for critical operations:
import { trace } from '@opentelemetry/api' ;
const tracer = trace . getTracer ( 'fluxer-server' );
async function sendMessage ( channelId : string , content : string ) {
const span = tracer . startSpan ( 'sendMessage' );
span . setAttributes ({
'channel.id' : channelId ,
'message.length' : content . length ,
});
try {
// Validate content
await validateMessage ( content );
// Save to Cassandra
const message = await db . messages . insert ({
channelId ,
content ,
timestamp: Date . now (),
});
// Publish to NATS
await nats . publish ( `channel. ${ channelId } .message` , message );
span . setStatus ({ code: SpanStatusCode . OK });
return message ;
} catch ( error ) {
span . recordException ( error );
span . setStatus ({ code: SpanStatusCode . ERROR , message: error . message });
throw error ;
} finally {
span . end ();
}
}
Custom Metrics
import { metrics } from '@opentelemetry/api' ;
const meter = metrics . getMeter ( 'fluxer-server' );
const messageCounter = meter . createCounter ( 'fluxer.messages.sent' , {
description: 'Total messages sent' ,
unit: '1' ,
});
const messageHistogram = meter . createHistogram ( 'fluxer.message.size' , {
description: 'Message size distribution' ,
unit: 'bytes' ,
});
// Record metrics
messageCounter . add ( 1 , {
'channel.type' : 'text' ,
'guild.id' : guildId ,
});
messageHistogram . record ( content . length , {
'channel.type' : 'text' ,
});
Prometheus Integration
SigNoz can scrape Prometheus metrics from instrumented services:
Service Discovery
Configure Docker Swarm labels for automatic scraping:
services :
fluxer_api :
deploy :
labels :
signoz.io/scrape : 'true'
signoz.io/port : '9464'
signoz.io/path : '/metrics'
Metrics Endpoint
Expose Prometheus metrics in your service:
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus' ;
import express from 'express' ;
const prometheusExporter = new PrometheusExporter ({
port: 9464 ,
endpoint: '/metrics' ,
});
const app = express ();
// ... app routes
app . listen ( 8080 );
Dashboards
Built-in Dashboards
SigNoz includes pre-built dashboards for:
APM - Application performance overview
Infrastructure - CPU, memory, disk, network
Database - Query performance and connection pools
Errors - Exception tracking and error rates
Custom Dashboards
Create custom dashboards for Fluxer-specific metrics:
Navigate to Dashboards
Open SigNoz UI → Dashboards → New Dashboard
Add Panels
Select visualization type:
Time series (line/area charts)
Bar charts
Pie charts
Value (single number)
Table
Configure Query
Use PromQL-like queries: # Message send rate
rate(fluxer_messages_sent_total[5m])
# P95 message send latency
histogram_quantile(0.95, fluxer_message_send_duration_bucket)
# Active WebSocket connections
sum(fluxer_websocket_connections)
Save Dashboard
Export as JSON for version control: cp dashboard.json fluxer_devops/signoz/dashboards/
Alerting
Alert Rules
Create alerts for critical conditions:
High Error Rate
Database Latency
Memory Usage
name : High Error Rate
query : rate(fluxer_errors_total[5m]) > 10
severity : critical
annotations :
description : 'Error rate is {{ $value }}/s (threshold: 10/s)'
labels :
service : fluxer-server
team : backend
name : High Cassandra Latency
query : histogram_quantile(0.95, cassandra_query_duration_bucket) > 100
severity : warning
annotations :
description : 'P95 Cassandra latency is {{ $value }}ms'
name : High Memory Usage
query : container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
severity : warning
annotations :
description : 'Container memory usage is {{ $value | humanizePercentage }}'
Notification Channels
Configure alerts to send notifications:
Slack - Post to #alerts channel
Email - Send to on-call team
Webhook - Integrate with PagerDuty, Opsgenie
Discord - Fluxer meta dogfooding!
Log Aggregation
Centralize logs from all Fluxer services:
Structured Logging
import pino from 'pino' ;
const logger = pino ({
level: process . env . LOG_LEVEL || 'info' ,
transport: {
target: 'pino-opentelemetry-transport' ,
options: {
exporterUrl: 'http://otel-collector:4318/v1/logs' ,
},
},
});
logger . info ({ userId , channelId }, 'User sent message' );
logger . error ({ error: err , userId }, 'Failed to send message' );
Log Querying
Search logs in SigNoz UI:
-- Find all errors for user
service_name = 'fluxer-server' AND level = 'error' AND userId = '123456'
-- Find slow database queries
service_name = 'fluxer-server' AND attributes . db .duration > 1000
-- Find specific error messages
service_name = 'fluxer-api' AND body CONTAINS 'rate limit exceeded'
Data Retention
Configure retention policies to manage storage:
-- Set 30-day retention for traces
ALTER TABLE signoz_traces . signoz_index_v2
MODIFY TTL timestamp + INTERVAL 30 DAY ;
-- Set 90-day retention for metrics
ALTER TABLE signoz_metrics . samples_v2
MODIFY TTL timestamp + INTERVAL 90 DAY ;
-- Set 7-day retention for logs
ALTER TABLE signoz_logs . logs
MODIFY TTL timestamp + INTERVAL 7 DAY ;
Sampling
Reduce trace volume with tail-based sampling:
processors :
tail_sampling :
policies :
- name : errors
type : status_code
status_code :
status_codes : [ ERROR ]
- name : slow-requests
type : latency
latency :
threshold_ms : 1000
- name : sample-others
type : probabilistic
probabilistic :
sampling_percentage : 10
Troubleshooting
Check OTel Collector logs :docker service logs fluxer-signoz_otel-collector -f
Verify connectivity :# From application container
curl -v http://otel-collector:4318/v1/traces
Check firewall rules :# Ensure ports 4317 and 4318 are accessible
High ClickHouse CPU usage
Check query performance :SELECT query, elapsed
FROM system . processes
ORDER BY elapsed DESC ;
Optimize with materialized views :CREATE MATERIALIZED VIEW signoz_traces . top_endpoints
ENGINE = SummingMergeTree()
ORDER BY ( service_name , http_route, timestamp )
AS SELECT
service_name ,
http_route,
toStartOfMinute( timestamp ) as timestamp ,
count () as count
FROM signoz_traces . signoz_index_v2
GROUP BY service_name , http_route, timestamp ;
Clock skew - Ensure NTP is configured:Context propagation - Verify trace context headers:import { context , propagation } from '@opentelemetry/api' ;
const headers = {};
propagation . inject ( context . active (), headers );
// Include headers in downstream requests
Best Practices
Use Consistent Attributes Define standard attribute names across services:
user.id
guild.id
channel.id
message.id
Sample High-Volume Traces Use tail-based sampling to keep errors and slow requests while sampling normal traffic.
Set Alert Thresholds Base thresholds on historical data and percentiles, not absolute values.
Monitor the Monitor Set up alerts for SigNoz/ClickHouse health and resource usage.
See Also