Monitoring

Maintaining service availability and monitoring resource metrics are vital for service operations. OroCloud monitoring processes ensure service continuity, efficient troubleshooting, and proactive resource management.

Monitoring tools

Oro internal monitoring

Oro uses industry-standard and in-house monitoring tools for all OroCloud environments. These tools power a comprehensive monitoring system that controls all vital aspects of infrastructure and application. The Oro support team uses an alert management system, a defined escalation procedure, and an incident response plan to manage detected incidents.

Oro does not provide access to its internal monitoring system, nor does it subscribe customers to internal alerts.

Google Cloud’s operations suite

Oro customers and partners can configure additional monitoring metrics using Google Cloud’s Operations Suite.

Uptime monitoring

Google Cloud’s Operations Suite allows monitoring application availability using uptime checks. An uptime check:

Tries to open the application URL and measures response time
Can connect from multiple locations in North America, South America, Europe, and Asia-Pacific
Can check the main page or any other page, including authenticated pages (use a dedicated application user)

Results are available via the GCP web GUI. See GCP uptime checks documentation for more information.

Keep the number of uptime checks reasonable to avoid adding unnecessary workload to the application.

OS metrics monitoring

Google Cloud’s Operations Suite Metrics Explorer provides collection, visualization, and alerting on OS metrics such as CPU load, disk load, load balancer, and more.The Oro support team monitors all key OS metrics and responds to alerts triggered by threshold violations.See GCP Metrics Explorer documentation for more information.

NewRelic and Blackfire

Customers can enable NewRelic and Blackfire monitoring solutions for their OroCloud environment. You must obtain your own license for any such tool.

Other proprietary monitoring suites require additional examination before Oro commits to implementation and support.

Metrics monitored by OroCloud support

This section describes the metrics monitored for every OroCloud environment. Use this as a reference for creating your own monitoring system.

There is no customer access or visibility into the metrics described here. The exact set of metrics, alerting, and escalation rules depends on the environment type (e.g., staging vs. production) and evolves as the OroCloud team improves monitoring.

OS metrics

CPU usage and load average
Disk space utilization
Disk IO metrics
RAM utilization
SWAP usage
Network bandwidth utilization and statistics
Process count
Zombie process count
Logged users count

Component server metrics

Component	Monitored metrics
Nginx	Internal server statistics, connection count, requests rate, PHP-FPM process count
PostgreSQL	Connection count, index usage, internal memory allocation, requests rate, slow requests, replication, backup status, locks
Redis	Collection size, allocated memory, requests rate, cluster status
RabbitMQ	Queue count and sizes, memory consumption, connection count, cluster state
Elasticsearch	JVM metrics, cluster state, requests rate, backup status

Application metrics

Web check — The main page is opened every few minutes; the primary availability indicator.
SSL checks — Verifies SSL certificate validity and renewal date.
DNS check — Verifies DNS record correctness.
HTTP status statistics — Tracks the ratio of non-OK responses (4xx and 5xx).
Application error statistics — Detects abnormalities and faults in application errors.
RabbitMQ application queues — Verifies that all application-specific message queues are present and processing.
Oro consumers — Checks that consumers are processing messages from RabbitMQ.
Application orders, users, and SKU statistics

Incident response

Alert thresholds

Oro monitoring defines two alert levels:

Warning

A warning threshold violation indicates the application may experience issues if the metric does not recover. Warnings allow proactive prevention before an incident occurs (e.g., disk usage warning).Warnings do not initiate an incident response and are processed routinely during business hours.

Critical

A critical threshold violation indicates an application incident is imminent or already in progress. Once triggered, these alerts initiate the incident response process.

Incident management

The OroCloud team uses an Incident Response Plan that covers:

SWAT team members and roles — Contact details, office and emergency numbers for the incident resolution team.
Incident triggers — Conditions that trigger service recovery actions.
Notification flow — Who should be informed and when during incident response.
Escalation process — How and why an incident may be escalated; may involve additional resources.
Incident closing steps — Actions to take after the incident is resolved.
Post-mortem analysis — Root cause identification and preventive measures (product fixes, infrastructure changes, process improvements, training, etc.).

When an incident occurs, affected OroCloud customers receive an email notification. The support team may request cooperative actions from the customer’s IT team. Customers are also notified when service is restored.

Planned maintenance windows

Maintenance windows for production OroCloud environments are planned and scheduled in advance. If the OroCloud service team initiates maintenance that involves only infrastructure changes, alerts are muted during the window.

Overview

Onboarding

Architecture & Security

Operations

Monitoring tools

Oro internal monitoring

Google Cloud’s operations suite

NewRelic and Blackfire

Metrics monitored by OroCloud support

OS metrics

Component server metrics

Application metrics

Incident response

Alert thresholds

Incident management

Planned maintenance windows

Build docs developers (and LLMs) love

Overview

Onboarding

Architecture & Security

Operations

​Monitoring tools

​Oro internal monitoring

​Google Cloud’s operations suite

​NewRelic and Blackfire

​Metrics monitored by OroCloud support

​OS metrics

​Component server metrics

​Application metrics

​Incident response

​Alert thresholds

​Incident management

​Planned maintenance windows

Build docs developers (and LLMs) love

Monitoring tools

Oro internal monitoring

Google Cloud’s operations suite

NewRelic and Blackfire

Metrics monitored by OroCloud support

OS metrics

Component server metrics

Application metrics

Incident response

Alert thresholds

Incident management

Planned maintenance windows