Prometheus
Prometheus scrapes metrics from all Orion services at 15-second intervals.
Configuration
The configuration lives at deploy/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
project: orion
rule_files:
- "/etc/prometheus/alert_rules.yml"
Scrape Targets
| Job |
Target |
Metrics Path |
prometheus |
localhost:9090 |
/metrics |
gateway |
gateway:8000 |
/metrics |
scout |
scout:8001 |
/metrics |
director |
director:8002 |
/metrics |
media |
media:8003 |
/metrics |
editor |
editor:8004 |
/metrics |
pulse |
pulse:8005 |
/metrics |
milvus |
milvus:9091 |
/metrics |
ollama |
ollama:11434 |
/api/metrics |
Optional Exporters
Uncomment in prometheus.yml to enable:
| Job |
Target |
Image |
postgres |
postgres-exporter:9187 |
prometheuscommunity/postgres-exporter |
redis |
redis-exporter:9121 |
oliver006/redis_exporter |
Key Metrics
Gateway (Go)
| Metric |
Type |
Description |
http_requests_total |
Counter |
Total HTTP requests by method, path, status |
http_request_duration_seconds |
Histogram |
Request latency distribution |
http_requests_in_flight |
Gauge |
Currently active requests |
websocket_connections_active |
Gauge |
Active WebSocket connections |
Python Services
| Metric |
Type |
Description |
http_requests_total |
Counter |
FastAPI request count |
http_request_duration_seconds |
Histogram |
Request duration |
event_bus_messages_published_total |
Counter |
Redis events published |
event_bus_messages_received_total |
Counter |
Redis events consumed |
Service-Specific
| Service |
Metric |
Description |
| Scout |
trends_detected_total |
Total trends detected |
| Director |
pipeline_runs_total |
Pipeline executions by status |
| Director |
pipeline_duration_seconds |
Pipeline execution time |
| Media |
images_generated_total |
Images generated by provider |
| Editor |
videos_rendered_total |
Videos rendered |
| Pulse |
events_aggregated_total |
Events processed |
Querying
Access Prometheus at http://localhost:9090 and use PromQL:
# Request rate by service
rate(http_requests_total[5m])
# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
# Pipeline success rate
sum(rate(pipeline_runs_total{status="completed"}[1h]))
/ sum(rate(pipeline_runs_total[1h]))