Zespan is an AI agent observability and engineering platform. It traces every agent decision, tool call, handoff, and delegation in production. It also provides prompt versioning, built-in LLM-as-judge evaluations, guardrails, cost optimization, and an AI ops assistant called ZespanPilot.

How do I instrument my AI agent with Zespan?

Zespan requires 2 lines of code. Import zespan and call zespan.init({ apiKey: process.env.LT_KEY }). This auto-patches OpenAI, Anthropic, Gemini, Bedrock, and Mistral. For framework-level tracing, add one handler: ZespanCallbackHandler for LangChain, ZespanCrewAIListener for CrewAI, or ZespanADKHandler for Google ADK.

Does Zespan support prompt versioning?

Yes. Zespan includes prompt management with versioning, a playground for iteration, and A/B testing to compare prompt versions against each other in production.

What evaluations does Zespan support?

Zespan ships 12 built-in LLM-as-judge evaluation templates including faithfulness, relevance, toxicity, groundedness, and more. Evaluations run automatically on every trace with no custom scoring functions required.

How does Zespan compare to Langfuse?

Zespan is agent-native: every span carries agent identity, delegations are first-class trace events, and an agent map is built automatically. Langfuse was built for LLM pipelines and extended to agents later. Zespan also ships 12 built-in eval templates (Langfuse has none), includes an AI cost optimizer, and ZespanPilot for AI ops. Langfuse has open-source self-hosting; Zespan does not.

What is the free tier for Zespan?

The free tier includes 10,000 traces per month, 14-day retention, 2 projects, and 1 seat. No credit card required.

Feature — Agent Monitoring

Know which agents are healthy — and which aren't.

Composite health scores, delegation graphs, and per-agent cost attribution — built for systems with many cooperating AI agents.

Works with LangChain, CrewAI, AutoGen, Google ADK, and any framework using the SDK.

Start for free →Get a demo

zespan.com — agent monitoring

Works withLangChainCrewAIAutoGenGoogle ADKLangGraphPydanticAIOpenTelemetry

A–F

health grades

0–100

composite score

3 signals

error · cost · eval

Agent Registry — zero config

Every agent that touches the SDK appears in the Agent Registry on its first run. Name, role, and framework are detected from span attributes automatically — no registration step, no YAML config.

Auto-discovery: agent_name, agent_role, agent_framework detected from span attributes
Full inventory: all agents active in a project with framework and call counts
Supports LangChain, CrewAI, AutoGen, Google ADK, and custom agents

agent registry — zero config

Zespan Agent Registry showing auto-discovered agents with framework and cost stats

Composite Health Score

Every agent gets a 0–100 score graded A–F, weighted across three signals. If an agent starts degrading — error rate creeping up, cost trending higher, eval scores slipping — the health score reflects it before users notice.

Error rate: last 24 hours — 40% weight
Cost trend: week-over-week change — 30% weight
Eval pass rate: last 7 days — 30% weight

composite health score

Zespan agent health dashboard with composite scores, grades, and signal breakdown

Delegation Graph

Visualizes which agents delegate to which other agents — cost and latency per handoff edge. Understand the coordination topology of your multi-agent system and pinpoint where time and money accumulate.

Per-hop attribution: cost, latency, and token counts per delegation edge
Derived from delegated_to / delegated_from span attributes automatically
Planning step sequences captured per agent for reasoning analysis

delegation graph

Per-Agent Analytics

Drill into any agent to see its full metrics in time-series: LLM calls, tool calls, planning steps, delegations, guardrail checks, total cost, latency, token counts, and error count.

Tools per agent: which tools called, how often, at what success rate
Models per agent: which models used across the time window
Time-series charts: all metrics over configurable time ranges

Get started

Set up in under 5 minutes

typescriptAgent Monitoring

import { Zespan, wrapOpenAI } from '@zespan/sdk';

const lt = new Zespan({ apiKey: process.env.ZESPAN_API_KEY });
const openai = wrapOpenAI(new OpenAI(), lt);

// Tag each agent's calls — registry is built automatically
const res = await openai.chat.completions.create({ model: 'gpt-4o', messages }, {
  metadata: { agent_name: 'researcher', agent_role: 'retrieval' },
});

Start for free →Get a demo

Frequently asked

How does Zespan know which agent made a call?

You pass agent_name and agent_role in the metadata when calling the SDK wrapper. Zespan reads these from span attributes and groups all calls under that agent. For framework integrations like LangChain or CrewAI, the callback handler injects these automatically.

What frameworks does agent monitoring support?

LangChain, LangGraph, CrewAI, AutoGen, Google ADK, PydanticAI, LlamaIndex, and any custom agent using the SDK or OpenTelemetry. If your agent makes LLM calls through any of the supported providers, it's monitored.

How is the health score calculated?

It's a weighted composite of three signals: error rate over the last 24 hours (40%), cost trend week-over-week (30%), and eval pass rate over the last 7 days (30%). The score updates in real time as new traces arrive.

Can I set alerts based on agent health?

Yes. You can set alert rules on error_rate for a specific agent's spans, or link an alert to an evaluation metric key. When the score crosses your threshold, Zespan notifies you via email, Slack, PagerDuty, or webhook.

Explore more features

Setup takes under 5 minutes. Works with OpenAI, Anthropic, LangChain, and more.

Get started free →Get a demo

← All features

Know which agents are healthy — and which aren't.

Agent Registry — zero config

Composite Health Score

Delegation Graph

Per-Agent Analytics

How does Zespan know which agent made a call?

What frameworks does agent monitoring support?

How is the health score calculated?

Can I set alerts based on agent health?

Tracing

Evaluations

Guardrails

Prompt Management