System Administration¶

How to monitor service health, GPU status, and configure AI providers.

System Health Page¶

Navigate to System Health from the sidebar. The page is divided into several panels: System Overview, Service Status, Infrastructure, and GPU.

System Health

System Overview¶

The System Overview panel displays host-level information and resource usage:

Host Information -- Hostname, OS/platform, architecture, CPU cores, Go version, and system uptime
Resource Usage -- CPU, Memory, and Disk usage shown as progress bars with percentages

A Live indicator in the panel header shows "Updated Xs ago" to confirm real-time data. The panel polls every 5 seconds.

Service Status¶

The Service Status panel monitors all seven backend services:

Service	Role
Gateway	HTTP gateway that routes requests to backend services
Scout (Trends)	Detects trending topics from external sources
Director (Scripts)	Generates content scripts from trends
Media (Assets)	Produces images, video, and audio assets
Editor (Publish)	Assembles and renders final video content
Publisher	Publishes content to external platforms
Pulse (Analytics)	Collects and aggregates analytics data

Each service row shows:

Status indicator -- Healthy (green dot), Unhealthy (red dot), or Checking (gray dot)
Uptime -- How long the service has been running
Queue size -- Number of items currently queued for that service

The panel auto-refreshes every 5 seconds with a Live indicator confirming the polling interval.

Infrastructure¶

The Infrastructure panel shows connectivity status for shared dependencies:

Redis -- Connection status to the Redis pub/sub and cache layer
PostgreSQL -- Connection status to the primary database

Per-service dependency checks are displayed with badges: R (Redis) and P (Postgres) next to each service, indicating whether that service can reach each dependency.

Status Indicators¶

Healthy (green dot) -- Service is running and responding to health checks
Unhealthy (red dot) -- Service is down or not responding
Checking (gray dot) -- Health check in progress

Service Restart Procedure

If a service shows as unhealthy, restart it with Docker Compose: docker compose -f deploy/docker-compose.yml restart <service-name>. Wait 10--15 seconds for the health check to update. If the service remains unhealthy after restart, check its logs with docker compose -f deploy/docker-compose.yml logs <service-name> --tail 50.

GPU Status Monitoring¶

The GPU section displays cards for each detected GPU on the machine running local AI models. Multi-GPU setups show a separate card per device. This panel polls every 1 second for near-real-time updates.

Metrics shown per GPU card:

Metric	Description
VRAM gauge	Visual indicator of VRAM usage (percentage)
VRAM Used	Actual VRAM consumption (e.g., 8743 / 24576 MB)
GPU Utilization	Current GPU compute utilization percentage
Temperature	GPU temperature in Celsius
Power Draw	Current power consumption in watts
GPU Clock	Current GPU core clock speed
Memory Clock	Current memory clock speed
Fan Speed	Fan speed percentage
Driver Version	Installed NVIDIA driver version
CUDA Version	CUDA toolkit version
Device	GPU model name (e.g., NVIDIA RTX 4090)

GPU Monitoring Thresholds

Watch these critical thresholds closely:

VRAM > 90% -- Risk of out-of-memory errors. Reduce concurrent generation jobs or switch some workloads to cloud providers.
Temperature > 85 C -- Sustained high temperatures can cause thermal throttling and reduce performance. Check cooling and airflow.
GPU Utilization > 95% for extended periods -- The GPU is saturated. Queue times will increase. Consider offloading workloads to cloud.

Provider Configuration¶

Navigate to Settings from the sidebar to configure AI providers for each generation service.

Settings page

Provider Cards¶

Four cards are available, one for each generation service:

Card	Controls
LLM (Text Generation)	Provider and model for script writing
Image Generation	Provider and model for visual assets
Video Generation	Provider and model for video content
Text-to-Speech	Provider and model for audio narration

Changing Providers¶

Select the Provider dropdown and choose between:
Local (Ollama / ComfyUI) -- Uses local GPU-accelerated models
Cloud (OpenAI / Replicate) -- Uses cloud API services
Select the Model from the available options for that provider
Click Save Configuration

The green dot next to each card title indicates whether the selected provider is currently connected and available.

Test Connection¶

Each provider card includes a Test Connection button. Click it to verify that the selected provider is reachable and properly configured before saving. This is especially useful when switching to cloud providers to confirm API keys are set.

Model Parameters¶

Expand the Model Parameters accordion on any provider card to fine-tune generation settings (e.g., temperature, max tokens). These parameters are applied when the provider processes requests.

When to Use Cloud vs Local¶

Scenario	Recommendation
GPU utilization consistently above 80%	Switch some workloads to Cloud
Cost budget is tight	Use Local providers
Need highest quality output	Use Cloud providers
Running without a GPU	Must use Cloud providers

Next Steps¶

Analytics Guide -- Track costs and provider usage
Dashboard Overview -- Tour of all dashboard pages
CLI Quickstart -- Manage the system from the command line
Monitoring Guide -- Prometheus and Grafana observability
Provider Setup -- Detailed provider configuration