Lelu logo
LeluEngine
Guides

Production Deployment

A checklist and deep-dive for running Lelu reliably in production — covering HTTPS, secrets management, Engine scaling, and observability.

Pre-Launch Checklist

Security & Infrastructure

TLS terminated at load balancer or ingress for all services
LELU_API_KEY rotated from default and stored in a secret manager
DATABASE_URL uses sslmode=require in production
Redis uses TLS (rediss://) or a private network
Engine replicas ≥ 2 for high availability
Health checks configured on /healthz for all services
Prompt injection detection enabled (automatic)

Observability & Monitoring

OpenTelemetry tracing configured with Jaeger/Zipkin
Prometheus metrics endpoint exposed and scraped
Behavioral analytics enabled for agent monitoring
Predictive analytics models trained with sufficient data (100+ samples)
Alert channels configured (Slack, PagerDuty, email)
Structured logs exported to your log platform

Policies & Compliance

OPA/Rego policies are version-controlled before deploy
Risk assessment thresholds tuned for your use case
Confidence gates configured appropriately
Audit retention configured (S3/object-store lifecycle, 1+ year)
Human review workflows tested and documented

In Docker Compose healthchecks, prefer 127.0.0.1 over localhost to avoid container-local hostname resolution edge cases.

Scaling the Engine

The Engine is stateless — scale horizontally by running multiple replicas behind a load balancer. All state lives in Redis.

docker-compose.override.yml
services:
  engine:
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: "1"
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s

Secrets Management

Never store secrets in environment files committed to source control. Use one of these patterns in production:

AWS Secrets Manager

Use the AWS SSM Parameter Store or Secrets Manager and inject via IAM role at runtime.

Kubernetes Secrets

Mount as environment variables from an encrypted Secret object — use Sealed Secrets or External Secrets Operator.

HashiCorp Vault

Use Vault Agent Injector to automatically inject secrets into pods at startup.

Observability

Lelu provides comprehensive metrics for monitoring authorization decisions, agent behavior, and system performance. Configure Prometheus scraping and alerting for production deployments.

Core Metrics

Authorization & HTTP metrics
lelu_http_requests_total{method="POST",path="/v1/agent/authorize",status="200"}
  # Request volume and status-code anomalies

lelu_http_request_duration_seconds{method="POST",path="/v1/agent/authorize"}
  # Latency SLO / p95 / p99

lelu_auth_decisions_total{type="agent",allowed="false"}
  # Deny-rate spikes and confidence policy pressure

lelu_agent_requests_total{agent_id,action,outcome}
  # Per-agent authorization outcomes

lelu_agent_confidence_score{agent_id,action}
  # Confidence score distribution

lelu_agent_risk_score{agent_id,action}
  # Risk score distribution

Behavioral Analytics Metrics

Reputation, anomalies, and alerts
lelu_agent_reputation_score{agent_id}
  # Current reputation score (0-1)

lelu_agent_anomaly_score{agent_id}
  # Anomaly detection score (0-1, higher = more anomalous)

lelu_agent_human_review_total{agent_id,reason}
  # Human review requirements by reason

lelu_policy_effectiveness_rate{policy_name,policy_version}
  # Policy success rate

Predictive Analytics Metrics

ML model performance
lelu_agent_prediction_accuracy{model_type,agent_id}
  # Model accuracy (0-1)

lelu_agent_prediction_latency_seconds{model_type}
  # Prediction latency

lelu_agent_predictions_total{model_type,outcome}
  # Prediction counts

lelu_agent_model_sample_count{model_type}
  # Training sample count

Multi-Agent Coordination Metrics

Delegation and swarm operations
lelu_agent_delegation_total{delegator,delegatee,outcome}
  # Agent delegation counts

lelu_swarm_operations_total{swarm_id,operation_type,outcome}
  # Swarm orchestration operations

lelu_swarm_agent_count{swarm_id}
  # Active agents per swarm

Recommended Alerts

critical
lelu_agent_reputation_score < 0.5
Agent reputation dropped below 50%
critical
lelu_agent_anomaly_score > 0.9
Severe anomaly detected
warning
lelu_http_request_duration_seconds{quantile="0.95"} > 0.5
P95 latency exceeds 500ms
warning
lelu_agent_prediction_accuracy < 0.7
ML model accuracy below 70%
info
lelu_policy_effectiveness_rate < 0.6
Policy effectiveness below 60%

Advanced Features Configuration

Enable and configure advanced features for production deployments.

OpenTelemetry Tracing

environment variables
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
OTEL_SERVICE_NAME=lelu-engine
OTEL_TRACES_EXPORTER=otlp
OTEL_TRACES_SAMPLER=always_on

Behavioral Analytics

environment variables
# Reputation thresholds
REPUTATION_LOW_THRESHOLD=0.5
REPUTATION_MIN_DECISIONS=10

# Anomaly detection
ANOMALY_DETECTION_ENABLED=true
ANOMALY_SEVERITY_THRESHOLD=0.7
ANOMALY_WINDOW_SIZE=100

# Baseline management
BASELINE_SAMPLE_SIZE=100
BASELINE_REFRESH_INTERVAL=24h

Predictive Analytics

environment variables
# Model training
MIN_SAMPLES_FOR_MODEL=100
MODEL_UPDATE_INTERVAL=6h
CONFIDENCE_MODEL_WINDOW=30d
REVIEW_MODEL_WINDOW=14d

# Prediction thresholds
CONFIDENCE_PREDICTION_THRESHOLD=0.7
REVIEW_PREDICTION_THRESHOLD=0.6
POLICY_OPTIMIZATION_THRESHOLD=0.5

Prompt Injection Detection

environment variables
# Enabled by default
PROMPT_INJECTION_DETECTION_ENABLED=true
PROMPT_INJECTION_SEVERITY_THRESHOLD=0.8

# Alert on high-severity detections
PROMPT_INJECTION_ALERT_ENABLED=true

Multi-Agent Deployment Considerations

When deploying systems with multiple coordinating agents, consider these additional factors.

Delegation Chain Limits

Set maximum delegation depth to prevent infinite loops and excessive latency.

MAX_DELEGATION_DEPTH=5

Swarm Coordination

Configure swarm size limits and timeout values for coordinated operations.

MAX_SWARM_SIZE=10 SWARM_OPERATION_TIMEOUT=30s

Trace Context Propagation

Ensure OpenTelemetry context is propagated across agent boundaries for complete trace visibility.