Skip to main content
Sentinel AI monitors and manages services defined in services.json. The configuration supports multiple service types including web servers, databases, and system services.

Overview

Services are configured in data/services.json with check commands and running indicators. Sentinel AI periodically executes these commands to verify service health.
Default Monitor Interval: 30 seconds (configured via MONITOR_INTERVAL in config.py)

Service Configuration File

The services file is located at data/services.json and is automatically loaded on startup:
DATA_DIR = "data"
SERVICES_FILE = os.path.join(DATA_DIR, "services.json")

def load_services(self):
    if os.path.exists(self.SERVICES_FILE):
        try:
            with open(self.SERVICES_FILE, "r") as f:
                return json.load(f)
        except Exception:
            return self.DEFAULT_SERVICES.copy()
    return self.DEFAULT_SERVICES.copy()

Service Schema

Each service requires three properties:
check_command
string
required
Shell command to check service status. Executed via SSH on the target server.Examples:
  • service nginx status
  • systemctl status postgresql
  • docker ps | grep my-container
  • /usr/local/bin/check_custom_service.sh
running_indicator
string
required
String pattern to search for in command output that indicates the service is running.Examples:
  • is running
  • active (running)
  • online
  • Status: healthy
The check is case-sensitive and uses substring matching. If this string is found in the command output, the service is considered healthy.
type
string
required
Service category for organization and reporting.Common types:
  • web_server - Nginx, Apache, Caddy
  • database - PostgreSQL, MySQL, MongoDB
  • system - SSH, cron, systemd services
  • application - Custom applications
  • container - Docker containers
  • cache - Redis, Memcached

Default Services

Sentinel AI includes three default services defined in src/core/config.py:
DEFAULT_SERVICES = {
    "nginx": {
        "check_command": "service nginx status",
        "running_indicator": "is running",
        "type": "web_server"
    },
    "postgresql": {
        "check_command": "service postgresql status",
        "running_indicator": "online",
        "type": "database"
    },
    "ssh": {
        "check_command": "service ssh status",
        "running_indicator": "is running",
        "type": "system"
    }
}
These defaults are used if data/services.json doesn’t exist or fails to load. They serve as a template for adding your own services.

Managing Services

Adding Services

Use the add_service() method to add new services:
from src.core.config import config

# Add a Redis service
config.add_service(
    name="redis",
    check_cmd="service redis-server status",
    indicator="is running",
    service_type="cache"
)

# Add a Docker container
config.add_service(
    name="api-container",
    check_cmd="docker inspect -f '{{.State.Status}}' api-container",
    indicator="running",
    service_type="container"
)

# Add a custom application
config.add_service(
    name="myapp",
    check_cmd="systemctl status myapp.service",
    indicator="active (running)",
    service_type="application"
)
The add_service() method automatically saves changes to data/services.json.

Removing Services

Remove services using the remove_service() method:
from src.core.config import config

# Remove a service
config.remove_service("redis")

# Verify removal
print("redis" in config.SERVICES)  # False

Updating Services

To update a service, remove and re-add it:
from src.core.config import config

# Update nginx configuration
config.remove_service("nginx")
config.add_service(
    name="nginx",
    check_cmd="systemctl status nginx",  # Changed to systemctl
    indicator="active (running)",        # Updated indicator
    service_type="web_server"
)

Service Examples

Web Servers

{
  "nginx": {
    "check_command": "service nginx status",
    "running_indicator": "is running",
    "type": "web_server"
  }
}

Databases

{
  "postgresql": {
    "check_command": "service postgresql status",
    "running_indicator": "online",
    "type": "database"
  }
}

Docker Containers

{
  "api-container": {
    "check_command": "docker inspect -f '{{.State.Status}}' api-container",
    "running_indicator": "running",
    "type": "container"
  }
}

Custom Applications

{
  "myapp": {
    "check_command": "systemctl status myapp.service",
    "running_indicator": "active (running)",
    "type": "application"
  }
}

Service Types

Organize services by type for better reporting and management:

Web Servers

Nginx, Apache, Caddy, Traefik

Databases

PostgreSQL, MySQL, MongoDB, Redis

System Services

SSH, cron, systemd, networking

Containers

Docker containers and services

Applications

Custom applications and APIs

Cache Services

Redis, Memcached, Varnish

Health Check Patterns

Service Command

Traditional service status check:
service nginx status
service postgresql status
service redis-server status
Look for: is running, running, active

Systemctl Command

Systemd service manager:
systemctl status nginx
systemctl is-active postgresql
systemctl show -p SubState nginx
Look for: active (running), active, SubState=running

Docker Commands

Container status checks:
docker ps | grep container-name
docker inspect -f '{{.State.Status}}' container-name
docker inspect -f '{{.State.Health.Status}}' container-name
Look for: Up, running, healthy

Custom Scripts

Use custom health check scripts:
/usr/local/bin/check_app_health.sh
/opt/monitoring/service_status.py
curl -sf http://localhost:8080/health || exit 1
Look for: Custom output patterns

Best Practices

Use Specific Indicators

Choose unique running indicators that won’t appear in error messages. Use active (running) instead of just active.

Test Commands Manually

Test each check command via SSH before adding to services.json to ensure it works correctly.

Keep Commands Fast

Avoid slow commands that could delay monitoring. Aim for sub-second execution time.

Handle Edge Cases

Consider services that might be stopped intentionally. Use service types to group related checks.

Document Dependencies

Note service dependencies in your services.json comments (if using JSONC) or separate docs.

Version Control

Keep services.json in version control to track configuration changes over time.

Advanced Configuration

Multi-Host Services

Monitor the same service across multiple hosts:
# Define services per host
HOST_SERVICES = {
    "prod-server-1": {
        "nginx": {...},
        "postgresql": {...}
    },
    "prod-server-2": {
        "nginx": {...},
        "api-app": {...}
    }
}

Service Dependencies

Track service dependencies for intelligent recovery:
SERVICE_DEPENDENCIES = {
    "api-app": ["postgresql", "redis"],
    "nginx": ["api-app"],
}

# Start dependencies first during recovery
def recover_service(service_name):
    deps = SERVICE_DEPENDENCIES.get(service_name, [])
    for dep in deps:
        ensure_service_running(dep)
    start_service(service_name)

Custom Health Checks

Implement application-specific health checks:
import requests

def check_api_health():
    """Custom health check for API service."""
    try:
        response = requests.get("http://localhost:8080/health", timeout=5)
        if response.status_code == 200:
            data = response.json()
            return data.get("status") == "healthy"
    except Exception:
        return False
    return False

# Register custom check
CUSTOM_CHECKS = {
    "api-service": check_api_health
}

Monitoring and Alerts

Sentinel AI monitors services at regular intervals (default: 30 seconds):
MONITOR_INTERVAL = 30  # seconds
MAX_RETRIES = 5

Monitoring Behavior

  1. Status Check: Execute check_command via SSH
  2. Pattern Match: Search output for running_indicator
  3. Health Decision: Service is healthy if indicator is found
  4. Recovery: If unhealthy, attempt automated remediation
  5. Retry Logic: Up to MAX_RETRIES attempts with exponential backoff
Failed health checks trigger the agent’s diagnostic and recovery workflow. Ensure your check commands are reliable to avoid false positives.

Troubleshooting

Problem: Service health check always fails even when service is running.Solutions:
  • Test the check command manually via SSH: ssh user@host 'service nginx status'
  • Verify the running_indicator string appears in the output
  • Check that the SSH user has permission to run the check command
  • Look for case sensitivity issues in the indicator string
Problem: Services from services.json are not being monitored.Solutions:
  • Verify data/services.json exists and is valid JSON
  • Check file permissions: ls -la data/services.json
  • Review startup logs for JSON parsing errors
  • Test JSON validity: python -m json.tool data/services.json
Problem: Health checks fail with “Permission denied” errors.Solutions:
  • Configure sudoers to allow check commands without password
  • Add SSH user to appropriate groups (docker, systemd-journal)
  • See SSH Setup for details
Problem: Monitoring interval is delayed due to slow check commands.Solutions:
  • Use faster status check methods (e.g., systemctl is-active instead of systemctl status)
  • Avoid commands that require DNS lookups or network calls
  • Consider increasing MONITOR_INTERVAL if checks can’t be optimized
  • Use timeout flags: timeout 5 service nginx status

API Reference

Configuration Methods

from src.core.config import config

# Load services from file
services = config.load_services()

# Save services to file
config.save_services()

# Add a new service
config.add_service(
    name="myservice",
    check_cmd="systemctl status myservice",
    indicator="active (running)",
    service_type="application"
)

# Remove a service
config.remove_service("myservice")

# Access services dictionary
all_services = config.SERVICES
if "nginx" in config.SERVICES:
    nginx_config = config.SERVICES["nginx"]
    print(f"Check: {nginx_config['check_command']}")

Next Steps

SSH Setup

Configure SSH authentication and permissions

AI Models

Customize AI model settings and behavior

Deployment

Deploy Sentinel AI with Docker

Monitoring

Learn about service monitoring features

Build docs developers (and LLMs) love