Zespan is an AI agent observability and engineering platform. It traces every agent decision, tool call, handoff, and delegation in production. It also provides prompt versioning, built-in LLM-as-judge evaluations, guardrails, cost optimization, and an AI ops assistant called ZespanPilot.

How do I instrument my AI agent with Zespan?

Zespan requires 2 lines of code. Import zespan and call zespan.init({ apiKey: process.env.LT_KEY }). This auto-patches OpenAI, Anthropic, Gemini, Bedrock, and Mistral. For framework-level tracing, add one handler: ZespanCallbackHandler for LangChain, ZespanCrewAIListener for CrewAI, or ZespanADKHandler for Google ADK.

Does Zespan support prompt versioning?

Yes. Zespan includes prompt management with versioning, a playground for iteration, and A/B testing to compare prompt versions against each other in production.

What evaluations does Zespan support?

Zespan ships 12 built-in LLM-as-judge evaluation templates including faithfulness, relevance, toxicity, groundedness, and more. Evaluations run automatically on every trace with no custom scoring functions required.

How does Zespan compare to Langfuse?

Zespan is agent-native: every span carries agent identity, delegations are first-class trace events, and an agent map is built automatically. Langfuse was built for LLM pipelines and extended to agents later. Zespan also ships 12 built-in eval templates (Langfuse has none), includes an AI cost optimizer, and ZespanPilot for AI ops. Langfuse has open-source self-hosting; Zespan does not.

What is the free tier for Zespan?

The free tier includes 10,000 traces per month, 14-day retention, 2 projects, and 1 seat. No credit card required.

Feature — Alerts & Incidents

Get paged before your users notice.

Alert rules on error rate, latency, cost, and eval quality. Multi-channel notifications. Full incident lifecycle with AI-generated postmortems.

Email, Slack, PagerDuty, webhook. Incident state machine. AI postmortem drafts.

Start for free →Get a demo

zespan.com — alerts & incidents

Works withEmailSlackPagerDutyWebhookEval-based alertsAI postmortems

incident states

alert metric targets

channels

Alert Rules

Alert rules fire when a metric crosses a threshold in a configurable window. Target error_rate, avg_latency, or total_cost. An optional comparison window enables week-over-week spike detection. Link alerts to evaluation metric keys — get paged when quality drops, not just when errors spike.

Metrics: error_rate, avg_latency, total_cost
Conditions: >, <, >=, <=, ==, != with configurable windowMin
Eval metric alerts: link to any evaluation metric key for quality-based alerting

alert rules

Multi-Channel Notifications

When an alert fires, notify via email (list of addresses), webhook (POST with payload), Slack, or PagerDuty. Mix channels per alert rule. Full alert history with sensitive fields (email addresses, webhook URLs) redacted.

Email, webhook, Slack, PagerDuty — combine channels per rule
Alert history: triggered, resolved, acknowledged, config changes
Sensitive field redaction: email addresses and webhook URLs redacted in history

Incident Lifecycle

Incidents progress through a formal state machine: open → investigating → mitigating → mitigated → resolved. Severity levels (critical/high/medium/low) for triage. A background worker correlates related alerts and traces into incident candidates automatically.

States: open, investigating, mitigating, mitigated, resolved
Transitions: ACKNOWLEDGE, MITIGATE, CONFIRM_MITIGATION, REVERT, RESOLVE, REOPEN, ESCALATE
AI correlation: background worker clusters related alerts and traces automatically

incident lifecycle

Zespan incident management showing state machine, severity, and correlated traces

AI Postmortem Generation

Every resolved incident can have a postmortem document. Zespan generates an AI-assisted draft from the incident timeline and related traces — what happened, when, which agents were involved, and how it was resolved. Editable and persistent at /incidents/[id]/postmortem.

AI draft from incident timeline and related traces
Resolution documentation: type, notes, and ticket URL
Active count badge: overview dashboard shows open + investigating incidents

Get started

Set up in under 5 minutes

typescriptAlerts & Incidents

// Alert rules are configured in the dashboard — no SDK code required.
// To trigger alerts from your own code, use the API:

await fetch('https://zespan.com/api/alerts', {
  method: 'POST',
  headers: { 'x-api-key': process.env.ZESPAN_API_KEY },
  body: JSON.stringify({
    metric: 'error_rate',
    condition: '>',
    threshold: 0.05,
    windowMin: 15,
    channels: ['slack', 'pagerduty'],
  }),
});

Start for free →Get a demo

Frequently asked

Can I alert on output quality — not just error rate?

Yes. Link an alert rule to any evaluation metric key — e.g., 'faithfulness'. When the average faithfulness score for a time window drops below your threshold, Zespan fires the alert exactly like an error_rate alert. This is the only LLM monitoring platform that supports eval-based alerting natively.

What's the minimum alert window I can configure?

The windowMin parameter accepts any positive integer (minutes). There's no enforced minimum — you can configure a 1-minute window for very short-cycle checks. In practice, 5–15 minutes balances sensitivity with noise reduction.

How is AI correlation different from manual incident creation?

Manual incidents require someone to notice a problem and create the incident. AI correlation runs a background worker continuously that clusters related alerts and trace anomalies into incident candidates automatically — so the incident exists before you've even looked at dashboards.

Can I integrate Zespan alerts with my existing on-call rotation?

Yes. PagerDuty integration dispatches to your existing services and schedules. Webhook integration lets you push to any system — Opsgenie, VictorOps, a custom Slack app, or your own incident management tooling.

Explore more features

Setup takes under 5 minutes. Works with OpenAI, Anthropic, LangChain, and more.

Get started free →Get a demo

← All features

Get paged before your users notice.

Alert Rules

Multi-Channel Notifications

Incident Lifecycle

AI Postmortem Generation

Can I alert on output quality — not just error rate?

What's the minimum alert window I can configure?

How is AI correlation different from manual incident creation?

Can I integrate Zespan alerts with my existing on-call rotation?

Tracing

Agent Monitoring

Evaluations

Guardrails