Feature — Alerts & Incidents
Get paged before your users notice.
Alert rules on error rate, latency, cost, and eval quality. Multi-channel notifications. Full incident lifecycle with AI-generated postmortems.
Email, Slack, PagerDuty, webhook. Incident state machine. AI postmortem drafts.

5
incident states
3
alert metric targets
4
channels
Alert Rules
Alert rules fire when a metric crosses a threshold in a configurable window. Target error_rate, avg_latency, or total_cost. An optional comparison window enables week-over-week spike detection. Link alerts to evaluation metric keys — get paged when quality drops, not just when errors spike.
- Metrics: error_rate, avg_latency, total_cost
- Conditions: >, <, >=, <=, ==, != with configurable windowMin
- Eval metric alerts: link to any evaluation metric key for quality-based alerting

Multi-Channel Notifications
When an alert fires, notify via email (list of addresses), webhook (POST with payload), Slack, or PagerDuty. Mix channels per alert rule. Full alert history with sensitive fields (email addresses, webhook URLs) redacted.
- Email, webhook, Slack, PagerDuty — combine channels per rule
- Alert history: triggered, resolved, acknowledged, config changes
- Sensitive field redaction: email addresses and webhook URLs redacted in history
Incident Lifecycle
Incidents progress through a formal state machine: open → investigating → mitigating → mitigated → resolved. Severity levels (critical/high/medium/low) for triage. A background worker correlates related alerts and traces into incident candidates automatically.
- States: open, investigating, mitigating, mitigated, resolved
- Transitions: ACKNOWLEDGE, MITIGATE, CONFIRM_MITIGATION, REVERT, RESOLVE, REOPEN, ESCALATE
- AI correlation: background worker clusters related alerts and traces automatically

AI Postmortem Generation
Every resolved incident can have a postmortem document. Zespan generates an AI-assisted draft from the incident timeline and related traces — what happened, when, which agents were involved, and how it was resolved. Editable and persistent at /incidents/[id]/postmortem.
- AI draft from incident timeline and related traces
- Resolution documentation: type, notes, and ticket URL
- Active count badge: overview dashboard shows open + investigating incidents
Get started
Set up in under 5 minutes
// Alert rules are configured in the dashboard — no SDK code required.
// To trigger alerts from your own code, use the API:
await fetch('https://zespan.com/api/alerts', {
method: 'POST',
headers: { 'x-api-key': process.env.ZESPAN_API_KEY },
body: JSON.stringify({
metric: 'error_rate',
condition: '>',
threshold: 0.05,
windowMin: 15,
channels: ['slack', 'pagerduty'],
}),
});Frequently asked
Can I alert on output quality — not just error rate?
Yes. Link an alert rule to any evaluation metric key — e.g., 'faithfulness'. When the average faithfulness score for a time window drops below your threshold, Zespan fires the alert exactly like an error_rate alert. This is the only LLM monitoring platform that supports eval-based alerting natively.
What's the minimum alert window I can configure?
The windowMin parameter accepts any positive integer (minutes). There's no enforced minimum — you can configure a 1-minute window for very short-cycle checks. In practice, 5–15 minutes balances sensitivity with noise reduction.
How is AI correlation different from manual incident creation?
Manual incidents require someone to notice a problem and create the incident. AI correlation runs a background worker continuously that clusters related alerts and trace anomalies into incident candidates automatically — so the incident exists before you've even looked at dashboards.
Can I integrate Zespan alerts with my existing on-call rotation?
Yes. PagerDuty integration dispatches to your existing services and schedules. Webhook integration lets you push to any system — Opsgenie, VictorOps, a custom Slack app, or your own incident management tooling.
Start free — 10K traces/month, no card needed
Setup takes under 5 minutes. Works with OpenAI, Anthropic, LangChain, and more.