Troubleshooting¶
Common issues and solutions when running Orion. Each section covers a specific layer of the stack with diagnostic commands and fixes.
Docker Issues¶
Services fail health checks¶
# Check which containers are unhealthy
docker compose -f deploy/docker-compose.yml ps
# View logs for a specific service
docker compose -f deploy/docker-compose.yml logs scout
# View logs with timestamps
docker compose -f deploy/docker-compose.yml logs -t scout
# Follow logs in real time across multiple services
docker compose -f deploy/docker-compose.yml logs -f scout director media
# Verify network connectivity between containers
docker compose -f deploy/docker-compose.yml exec scout ping postgres
# Check if a service can reach Redis
docker compose -f deploy/docker-compose.yml exec scout redis-cli -u redis://redis:6379 ping
Health check timing
All services have a 30-second interval with 3 retries. Allow up to 90 seconds for services to become healthy after startup. PostgreSQL and Milvus take longer on first run while initializing data directories.
Port conflicts¶
If a port is already in use, find the conflicting process and either stop it or remap the port:
# Find what's using a port (Linux)
sudo lsof -i :8000
# or
sudo ss -tlnp | grep 8000
# Find what's using a port (macOS)
lsof -i :8000
Default port assignments:
| Service | Default Port |
|---|---|
| Gateway | 8000 |
| Scout | 8001 |
| Director | 8002 |
| Media | 8003 |
| Editor | 8004 |
| Pulse | 8005 |
| Publisher | 8006 |
| Dashboard | 3001 |
| PostgreSQL | 5432 |
| Redis | 6379 |
| Milvus | 19530, 9091 |
| Ollama | 11434 |
| ComfyUI | 8188 |
Container won't start¶
# View the exit code and error
docker compose -f deploy/docker-compose.yml ps -a
# Inspect a specific container
docker inspect $(docker compose -f deploy/docker-compose.yml ps -q scout)
# Rebuild a specific service image
docker compose -f deploy/docker-compose.yml build scout
# Rebuild all images from scratch (no cache)
docker compose -f deploy/docker-compose.yml build --no-cache
Reset all data¶
# Stop services and remove all volumes
docker compose -f deploy/docker-compose.yml down -v
# Full cleanup: remove volumes, orphan containers, and images
docker compose -f deploy/docker-compose.yml down -v --remove-orphans --rmi local
Warning
The -v flag removes all named volumes (pgdata, redisdata, milvusdata, ollamadata). All data will be permanently lost. Re-run ./scripts/seed-db.sh after restarting to restore initial data.
Disk space issues¶
# Check Docker disk usage
docker system df
# Clean up unused images, containers, and volumes
docker system prune -a --volumes
Connectivity Issues¶
Gateway cannot reach services¶
Verify service URLs in .env match your runtime environment:
Use container hostnames (service names from docker-compose.yml):
Redis connection refused¶
# Verify Redis is running
redis-cli -u redis://localhost:6379 ping
# Expected: PONG
# Check Redis info
redis-cli -u redis://localhost:6379 info server
# List active pub/sub channels (should show orion.* channels)
redis-cli -u redis://localhost:6379 pubsub channels "orion.*"
PostgreSQL authentication failed¶
Ensure credentials match between the database container and service environment:
# Test connection directly
psql -h localhost -U orion -d orion -c "SELECT 1;"
# Check which env vars are set
grep POSTGRES .env
# Verify the database exists
psql -h localhost -U orion -l
Milvus connection issues¶
# Check Milvus health
curl http://localhost:9091/healthz
# Check Milvus metrics
curl http://localhost:9091/metrics | head -20
Authentication Issues¶
JWT token expired¶
Tokens expire after 24 hours by default. Re-authenticate:
# Refresh an existing token
curl -X POST http://localhost:8000/api/v1/auth/refresh \
-H "Authorization: Bearer $OLD_TOKEN"
# Or get a new token
TOKEN=$(curl -s http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "admin@orion.local", "password": "orion_dev"}' \
| jq -r '.access_token')
401 Unauthorized on all requests¶
Check that the Authorization header uses the correct Bearer prefix:
# Correct format
curl -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
http://localhost:8000/api/v1/scout/api/v1/trends
Check current auth status¶
Python Service Issues¶
Module not found errors¶
Ensure you installed the service and the shared library in editable mode:
cd services/scout
uv pip install -e ".[dev]"
# Also install the shared library if not already
cd ../../libs/orion-common
uv pip install -e ".[dev]"
Database migration errors¶
# Check current migration state
cd services/scout
alembic current
# Run all pending migrations
alembic upgrade head
# View migration history
alembic history --verbose
# Downgrade one step (if needed)
alembic downgrade -1
Type checking or linting failures¶
cd services/scout
# Run type checker
mypy src/
# Run linter with auto-fix
ruff check src/ --fix
# Run formatter
black src/ tests/
Service won't start locally¶
# Check if the port is already in use
lsof -i :8001
# Run with verbose output
uvicorn src.main:app --reload --port 8001 --log-level debug
# Verify dependencies are installed
uv pip list | grep -i orion
Go Build Issues¶
make build fails¶
# Verify Go version (requires 1.24+)
go version
# Download dependencies
go mod download
# Tidy modules
go mod tidy
# Clean and rebuild
make clean && make build
Test failures¶
# Run tests with verbose output
go test -v ./...
# Run a specific test
go test -v -run TestHealthHandler ./internal/gateway/...
# Run tests with race detection
go test -race ./...
Linter errors¶
# Run golangci-lint
make lint
# Run with verbose output for debugging
golangci-lint run ./... -v
# Auto-fix what's possible
golangci-lint run ./... --fix
Dashboard Issues¶
Build fails¶
cd dashboard
# Clean install (removes node_modules and reinstalls)
rm -rf node_modules .next
npm ci
# Check for TypeScript errors
npx tsc --noEmit
# Run linter
npm run lint
Dashboard can't connect to gateway¶
Verify the gateway URL environment variable:
# Check what URL is configured
echo $NEXT_PUBLIC_GATEWAY_URL
# Default: http://localhost:8000
# Set it explicitly if needed
NEXT_PUBLIC_GATEWAY_URL=http://localhost:8000 npm run dev