Observability

OpenTelemetry-native

Logs, metrics, and traces— plus platform intelligence.

Celeris is OpenTelemetry-native, exports to your existing providers, and adds an AI agent that understands the full stack to explain and act—fast.

OpenTelemetry-first (OTLP)

Export to any backend

AI agent with full-stack context

Celeris AI sees deploys, configs, flags—not just telemetry

Explore Service Graph See OTel pipeline

Service Graph: checkout

API Gateway

p95: 42ms

checkout-api

p95: 128ms

payments

p95: 89ms

orders-db

cache

Stripe

deploy v42

flag: new-checkout

index change

Celeris AI

Spike correlates with rollout of v42 + increased DB read amplification on orders table.

RED Metrics

Rate: 1.2k/s Errors: 0.3% Duration: 142ms

12:34:56.789 INFO checkout-api: Processing order #4521

12:34:56.812 DEBUG payments: Stripe charge initiated

12:34:57.023 WARN orders-db: Slow query detected (234ms)

12:34:57.156 INFO checkout-api: Order #4521 completed

Logs

Metrics

Traces

App: checkout • prod

How It Works

OpenTelemetry in. Any provider out.

Standards-first instrumentation. Route signals to your existing providers. Keep your stack, add platform intelligence.

Your Workloads

Auto-instrumented

Node.js

Go

Python

Java

.NET

Logs OTLP

Metrics OTLP

Traces OTLP

Celeris OTel Gateway

Collector + Context

OTLP ingestion

Platform context enrichment

Sampling + PII redaction

Multi-destination routing

AI suggests sampling rules

Destinations

Your existing stack

Your APM

Your Logs

Your Metrics

Your SIEM

Warehouse

Object Store

No lock-in

Auto-instrumentation Setup

Standards-first. No proprietary agents required.

celeris.yaml

observability:
  otel:
    endpoint: "${CELERIS_OTEL_ENDPOINT}"
    protocol: "grpc"
  auto_instrument: true
  service_name: "checkout-api"

→ Celeris auto-injects OTEL SDKs at deploy time. Just set service.name conventions.

OTLP-native

No vendor lock-in

Platform context added

Service Graph

See the system, not just charts.

Interactive topology that matches your actual application model. Understand dependencies, health, and changes at a glance.

App:

App: checkout

gateway

API Gateway

Owner: Team Platform

SLO: 99.95%

checkout-api

cart-svc

payments

inventory

orders-db

cache

Timeline

v42 deployed

Healthy

Degraded

Failing

Context

checkout-api

Golden Signals

Rate

1.2k/s

Errors

0.3%

Duration (p95)

142ms

Saturation

67%

Ownership

CP

Team Commerce Platform

prod • v42

SLO: 99.9% ✓ On track

15% of error budget consumed

AI Incident Response

Alerts are the start. Context and action are the finish.

When something breaks, Celeris AI explains the full picture and suggests safe actions—with approval workflows built in.

Alert Timeline

2 active

P95 Latency Spike 2m ago

checkout-api p95 exceeded 500ms threshold

SLO burn: 4.2x checkout-api

Error Rate Elevated 8m ago

payments service error rate at 2.1%

SLO burn: 1.8x payments

DB Connection Saturation 1h ago

Resolved • Duration: 12m

Impacted Services

checkout

payments

orders-db

Celeris AI Analysis

What Changed

Deploy v42 → checkout-api (8m ago)

Flag enabled: new-checkout-flow (12m ago)

What Broke

Slow queries on orders_db.orders table causing N+1 pattern. New query path introduced in v42 lacks index for user_id filter.

Evidence

3 slow traces 42 warn logs +340ms p95

Recommendation

Add index on orders(user_id, created_at) or rollback v42 to restore performance.

Actions

Rollback deployment

Approval required

Revert checkout-api to v41

Disable feature flag

Auto-approved

Turn off new-checkout-flow flag

Scale service

Approval required

Add 2 replicas to checkout-api

Increase sampling

Auto-approved

100% trace sampling for 1 hour

Why Celeris AI produces better answers

Because Celeris AI sees the full stack graph, not just raw telemetry:

Application graph + ownership Deployments & configs Feature flags & experiments Gateways & edge routing Identity & policies Data stores & relationships

→ AI responses are grounded in platform truth, not just telemetry.

Signals Explorer

Everything you need for day-to-day debugging.

Logs, metrics, and traces with correlation baked in. Jump between signals with context preserved.

12:34:56.789 INFO checkout-api Processing order #4521 for user_id=8472

12:34:56.812 DEBUG payments Stripe charge initiated amount=42.99 currency=USD

12:34:57.023 WARN orders-db Slow query detected: SELECT * FROM orders WHERE user_id=8472 (234ms)

12:34:57.156 INFO checkout-api Order #4521 completed successfully total_time=892ms

12:34:58.001 ERROR payments Stripe webhook validation failed: signature mismatch

deploy: v42 flag: new-checkout

Showing logs from last 15 minutes

Already using a provider? Celeris exports everything via OpenTelemetry.

SLOs & Alerting

SLOs that connect to owners and releases.

Define service level objectives that map to your application graph. Know who to alert and what changed.

SLO Builder

Service Level Indicator

Target

% of requests under 500ms

Rolling Window

Burn Rate Alerts

Fast burn (14.4x) → page immediately

Slow burn (3x) → alert after 1h

Ownership

CP

Team Commerce Platform

On-call: @commerce-oncall

Burn Rate Monitor

checkout-api latency SLO 99.93%

15% budget consumed Target: 99.9%

payments error rate 99.7%

inventory availability 99.99%

orders-db latency 99.1%

⚠ Fast burn detected (4.2x)

Alert Routing

Slack: #checkout-alerts

PagerDuty: @commerce-oncall

AI summary in alerts ✓

Experiment Insights

Measure impact like an experiment—using real signals.

Run experiments and analyze outcomes using the same telemetry you already collect. No separate analytics SDK required.

Experiment Setup

Feature Flag

new-checkout-flow

2 variants • 50/50 allocation

Cohorts

US region Pro plan Web

Success Metrics

Checkout latency (p95) Primary

Error rate

Conversion rate

Guardrails

Stop if error rate > 2%

Stop if SLO burn > 2x

Results

Checkout latency (p95) -8.2%

Control

152ms

Treatment

139ms

Error rate No change

Control

0.31%

Treatment

0.34%

All guardrails passing

AI Interpretation

Treatment improves p95 latency by 8% but shows 12% higher DB cost due to new query pattern. Recommend: enable for EU region only where latency improvement has highest impact.

Started 3 days ago Statistical significance: 95%

This is experiment insights, not full product analytics. For deeper analysis, integrate with your analytics provider.

Connect performance to cost.

See cost signals overlaid on your service graph. Know which endpoints drive spend.

Cost per request Forecast chips Spend drivers

This week forecast

+6% projected spend

Driver: egress from checkout-api

Ask Celeris AI:

Go deeper in FinOps

Works with your stack.

Export via OpenTelemetry to your existing providers. No lock-in, no migration required.

APM Providers

Datadog, New Relic, etc.

Log Aggregators

Splunk, Elastic, etc.

SIEM

Security logs export

Data Warehouses

Long-term analytics

OpenTelemetry Protocol (OTLP) — industry standard, no proprietary agents

Bring your tools. Add platform intelligence.

OTel-native signals, export anywhere, AI-guided context and action.

OpenTelemetry-native

Export to any provider

AI-first incident response

Get Started Talk to Sales

Works with your existing observability stack. No migration required.

Celeris Agent

AI Cloud

Control & Trust

Learn

Proof

Trust

Deployment & Isolation

Governance & Compliance

Logs, metrics, and traces— plus platform intelligence.

Service Graph: checkout

OpenTelemetry in. Any provider out.

Your Workloads

Celeris OTel Gateway

Destinations

Auto-instrumentation Setup

Per-App Routing Rules

Sampling & PII Redaction

Multi-tenant Isolation

See the system, not just charts.

Context

Alerts are the start. Context and action are the finish.

Alert Timeline

Celeris AI Analysis

Why Celeris AI produces better answers

Everything you need for day-to-day debugging.

RED Metrics

Infrastructure

Database

Cache

POST /api/checkout

SLOs that connect to owners and releases.

SLO Builder

Burn Rate Monitor

Measure impact like an experiment—using real signals.

Experiment Setup

Results

Connect performance to cost.

Works with your stack.

Bring your tools. Add platform intelligence.