Monitoring and alerting
Monitoring
Stellio services export telemetry data using the OpenTelemetry protocol (OTLP). The recommended backend is an OpenTelemetry Collector feeding into the Grafana LGTM stack (Loki for logs, Tempo for traces, Mimir or Prometheus for metrics), with Grafana for visualisation.
It is also recommended to monitor the VMs with node_exporter and the Docker containers with cAdvisor.
Metrics
Export of metrics is disabled by default, it can be enabled by setting
the management.otlp.metrics.export.enabled property to true (env var name in parentheses):
management.otlp.metrics.export.enabled(MANAGEMENT_OTLP_METRICS_EXPORT_ENABLED): enable metrics export (default:false)
The following property configures where metrics are sent (env var name in parentheses):
management.otlp.metrics.export.url(MANAGEMENT_OTLP_METRICS_EXPORT_URL): URL of the OTLP metrics endpoint (default:http://localhost:4318/v1/metrics)
Health endpoint
The Stellio services expose a health endpoint
at /actuator/health. It is enabled by default and does not require the otel profile.
An example Prometheus configuration to probe the health of Stellio services (using Blackbox exporter):
- job_name: 'Stellio services - health'
metrics_path: /probe
scrape_interval: 1m
params:
module: [http_200_stellio]
static_configs:
- targets: ['http://stellio-host:8080/actuator/health']
labels:
name: 'API Gateway'
- targets: ['http://stellio-host:8083/actuator/health']
labels:
name: 'Search Service'
- targets: ['http://stellio-host:8084/actuator/health']
labels:
name: 'Subscription Service'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:9115
Where http_200_stellio is configured in this way:
http_200_stellio:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200]
method: GET
fail_if_body_not_matches_regexp:
- "UP"
Health information can, for instance, be monitored with the following community Grafana dashboard: https://grafana.com/grafana/dashboards/12275.
It is then easy to create alerts based on the health of the services.
Logging
The logs produced by the Stellio services are exported via OpenTelemetry (OTLP) and can be ingested by any compatible backend, such as Grafana Loki.
Export of logs is controlled by the otel Spring profile. It must be included in SPRING_PROFILES_ACTIVE for
the services to emit telemetry data:
SPRING_PROFILES_ACTIVE=otel
Without this profile, the services start normally but do not send any logs to the OTel backend.
When using the provided docker-compose setup, this is controlled via the ENVIRONMENT variable in the .env file:
ENVIRONMENT=docker,otel
The following property configures where logs are sent (env var name in parentheses):
management.opentelemetry.logging.export.otlp.endpoint(MANAGEMENT_OPENTELEMETRY_LOGGING_EXPORT_OTLP_ENDPOINT): URL of the OTLP logs endpoint (default:http://localhost:4318/v1/logs)
Service identification
Each service sends resource attributes that identify it in the OTel backend. These are pre-configured per service:
management.opentelemetry.resource-attributes.service.name: identifies the service (named as the service, e.g.,search-service)management.opentelemetry.resource-attributes.service.namespace: identifies the service namespace (alwaysstellio)management.opentelemetry.resource-attributes.deployment.environment.name: identifies the deployment environment; defaults tolocal, override via theDEPLOYMENT_ENVIRONMENT_NAMEenvironment variable
These labels align with the attributes defined by OpenTemetry, making it straightforward to filter logs per service in a Grafana dashboard.
Docker Compose configuration
In a docker-compose or Docker Swarm based deployment, the environment variables can be declared by adding the following
in the environment section of each service:
search-service:
container_name: search-service
image: stellio/stellio-search-service:latest-dev
environment:
- SPRING_PROFILES_ACTIVE=${ENVIRONMENT}
- MANAGEMENT_OPENTELEMETRY_LOGGING_EXPORT_OTLP_ENDPOINT=${MANAGEMENT_OPENTELEMETRY_LOGGING_EXPORT_OTLP_ENDPOINT}
- MANAGEMENT_OPENTELEMETRY_RESOURCE_ATTRIBUTES_DEPLOYMENT_ENVIRONMENT_NAME=${DEPLOYMENT_ENVIRONMENT_NAME}
- MANAGEMENT_OTLP_METRICS_EXPORT_ENABLED=${MANAGEMENT_OTLP_METRICS_EXPORT_ENABLED}
- MANAGEMENT_OTLP_METRICS_EXPORT_URL=${MANAGEMENT_OTLP_METRICS_EXPORT_URL}
And in the .env file:
ENVIRONMENT=docker,otel
DEPLOYMENT_ENVIRONMENT_NAME=production
MANAGEMENT_OPENTELEMETRY_LOGGING_EXPORT_OTLP_ENDPOINT=http://otel-collector:4318/v1/logs
MANAGEMENT_OTLP_METRICS_EXPORT_ENABLED=true
MANAGEMENT_OTLP_METRICS_EXPORT_URL=http://otel-collector:4318/v1/metrics