Observability
OpenTelemetry-native

Logs, metrics, and traces— plus platform intelligence.

Celeris is OpenTelemetry-native, exports to your existing providers, and adds an AI agent that understands the full stack to explain and act—fast.

OpenTelemetry-first (OTLP)
Export to any backend
AI agent with full-stack context

Celeris AI sees deploys, configs, flags—not just telemetry

Service Graph: checkout

API Gateway
p95: 42ms
checkout-api
p95: 128ms
payments
p95: 89ms
orders-db
cache
Stripe
deploy v42
flag: new-checkout
index change

Celeris AI

Spike correlates with rollout of v42 + increased DB read amplification on orders table.

RED Metrics

Rate: 1.2k/s Errors: 0.3% Duration: 142ms
12:34:56.789 INFO checkout-api: Processing order #4521
12:34:56.812 DEBUG payments: Stripe charge initiated
12:34:57.023 WARN orders-db: Slow query detected (234ms)
12:34:57.156 INFO checkout-api: Order #4521 completed
Logs
Metrics
Traces
App: checkout prod
How It Works

OpenTelemetry in. Any provider out.

Standards-first instrumentation. Route signals to your existing providers. Keep your stack, add platform intelligence.

Your Workloads

Auto-instrumented

Node.js
Go
Python
Java
.NET
Logs OTLP
Metrics OTLP
Traces OTLP

Celeris OTel Gateway

Collector + Context

OTLP ingestion
Platform context enrichment
Sampling + PII redaction
Multi-destination routing
AI suggests sampling rules

Destinations

Your existing stack

Your APM
Your Logs
Your Metrics
Your SIEM
Warehouse
Object Store
No lock-in

Auto-instrumentation Setup

Standards-first. No proprietary agents required.

celeris.yaml
observability:
  otel:
    endpoint: "${CELERIS_OTEL_ENDPOINT}"
    protocol: "grpc"
  auto_instrument: true
  service_name: "checkout-api"

Celeris auto-injects OTEL SDKs at deploy time. Just set service.name conventions.

OTLP-native
No vendor lock-in
Platform context added
Service Graph

See the system, not just charts.

Interactive topology that matches your actual application model. Understand dependencies, health, and changes at a glance.

App:
App: checkout
gateway

API Gateway

Owner: Team Platform

SLO: 99.95%

checkout-api
cart-svc
payments
inventory
orders-db
cache
Timeline
v42 deployed
Healthy
Degraded
Failing
AI Incident Response

Alerts are the start. Context and action are the finish.

When something breaks, Celeris AI explains the full picture and suggests safe actions—with approval workflows built in.

Alert Timeline

2 active
P95 Latency Spike 2m ago

checkout-api p95 exceeded 500ms threshold

SLO burn: 4.2x checkout-api
Error Rate Elevated 8m ago

payments service error rate at 2.1%

SLO burn: 1.8x payments
DB Connection Saturation 1h ago

Resolved • Duration: 12m

Impacted Services

checkout
payments
orders-db

Celeris AI Analysis

What Changed

Deploy v42 → checkout-api (8m ago)
Flag enabled: new-checkout-flow (12m ago)

What Broke

Slow queries on orders_db.orders table causing N+1 pattern. New query path introduced in v42 lacks index for user_id filter.

Evidence

3 slow traces 42 warn logs +340ms p95

Recommendation

Add index on orders(user_id, created_at) or rollback v42 to restore performance.

Actions

Rollback deployment
Approval required

Revert checkout-api to v41

Disable feature flag
Auto-approved

Turn off new-checkout-flow flag

Scale service
Approval required

Add 2 replicas to checkout-api

Increase sampling
Auto-approved

100% trace sampling for 1 hour

Why Celeris AI produces better answers

Because Celeris AI sees the full stack graph, not just raw telemetry:

Application graph + ownership Deployments & configs Feature flags & experiments Gateways & edge routing Identity & policies Data stores & relationships

AI responses are grounded in platform truth, not just telemetry.

Signals Explorer

Everything you need for day-to-day debugging.

Logs, metrics, and traces with correlation baked in. Jump between signals with context preserved.

12:34:56.789 INFO checkout-api Processing order #4521 for user_id=8472
12:34:56.812 DEBUG payments Stripe charge initiated amount=42.99 currency=USD
12:34:57.023 WARN orders-db Slow query detected: SELECT * FROM orders WHERE user_id=8472 (234ms)
12:34:57.156 INFO checkout-api Order #4521 completed successfully total_time=892ms
12:34:58.001 ERROR payments Stripe webhook validation failed: signature mismatch
deploy: v42 flag: new-checkout
Showing logs from last 15 minutes

Already using a provider? Celeris exports everything via OpenTelemetry.

SLOs & Alerting

SLOs that connect to owners and releases.

Define service level objectives that map to your application graph. Know who to alert and what changed.

SLO Builder

% of requests under 500ms
Fast burn (14.4x) → page immediately
Slow burn (3x) → alert after 1h
CP

Team Commerce Platform

On-call: @commerce-oncall

Burn Rate Monitor

checkout-api latency SLO 99.93%
15% budget consumed Target: 99.9%
payments error rate 99.7%
inventory availability 99.99%
orders-db latency 99.1%

⚠ Fast burn detected (4.2x)

Alert Routing

Slack: #checkout-alerts
PagerDuty: @commerce-oncall
AI summary in alerts ✓
Experiment Insights

Measure impact like an experiment—using real signals.

Run experiments and analyze outcomes using the same telemetry you already collect. No separate analytics SDK required.

Experiment Setup

new-checkout-flow

2 variants • 50/50 allocation

US region Pro plan Web
Checkout latency (p95) Primary
Error rate
Conversion rate
Stop if error rate > 2%
Stop if SLO burn > 2x

Results

Checkout latency (p95) -8.2%
Control
152ms
Treatment
139ms
Error rate No change
Control
0.31%
Treatment
0.34%
All guardrails passing
AI Interpretation

Treatment improves p95 latency by 8% but shows 12% higher DB cost due to new query pattern. Recommend: enable for EU region only where latency improvement has highest impact.

Started 3 days ago Statistical significance: 95%

This is experiment insights, not full product analytics. For deeper analysis, integrate with your analytics provider.

Connect performance to cost.

See cost signals overlaid on your service graph. Know which endpoints drive spend.

Cost per request Forecast chips Spend drivers

This week forecast

+6% projected spend

Driver: egress from checkout-api

Ask Celeris AI:

Go deeper in FinOps

Works with your stack.

Export via OpenTelemetry to your existing providers. No lock-in, no migration required.

APM Providers

Datadog, New Relic, etc.

Log Aggregators

Splunk, Elastic, etc.

SIEM

Security logs export

Data Warehouses

Long-term analytics

OpenTelemetry Protocol (OTLP) — industry standard, no proprietary agents

Bring your tools. Add platform intelligence.

OTel-native signals, export anywhere, AI-guided context and action.

OpenTelemetry-native
Export to any provider
AI-first incident response

Works with your existing observability stack. No migration required.