Zespan is an AI agent observability and engineering platform. It traces every agent decision, tool call, handoff, and delegation in production. It also provides prompt versioning, built-in LLM-as-judge evaluations, guardrails, cost optimization, and an AI ops assistant called ZespanPilot.

How do I instrument my AI agent with Zespan?

Zespan requires 2 lines of code. Import zespan and call zespan.init({ apiKey: process.env.LT_KEY }). This auto-patches OpenAI, Anthropic, Gemini, Bedrock, and Mistral. For framework-level tracing, add one handler: ZespanCallbackHandler for LangChain, ZespanCrewAIListener for CrewAI, or ZespanADKHandler for Google ADK.

Does Zespan support prompt versioning?

Yes. Zespan includes prompt management with versioning, a playground for iteration, and A/B testing to compare prompt versions against each other in production.

What evaluations does Zespan support?

Zespan ships 12 built-in LLM-as-judge evaluation templates including faithfulness, relevance, toxicity, groundedness, and more. Evaluations run automatically on every trace with no custom scoring functions required.

How does Zespan compare to Langfuse?

Zespan is agent-native: every span carries agent identity, delegations are first-class trace events, and an agent map is built automatically. Langfuse was built for LLM pipelines and extended to agents later. Zespan also ships 12 built-in eval templates (Langfuse has none), includes an AI cost optimizer, and ZespanPilot for AI ops. Langfuse has open-source self-hosting; Zespan does not.

What is the free tier for Zespan?

The free tier includes 10,000 traces per month, 14-day retention, 2 projects, and 1 seat. No credit card required.

Feature — Prompt Management

Ship prompt changes without breaking production.

Version history, production promotion, automatic regression detection after every deploy, and AI-powered optimization suggestions.

Every trace linked to its prompt version. Regression caught before users do.

Start for free →Get a demo

zespan.com — prompt management

Works withVersion historyProduction labelsRegression detectionAI optimizationSDK fetchA/B comparison

14 days

regression lookback

10%

regression threshold

SHA-256

content hash

Version History & Trace Linkage

Every prompt change creates an immutable version with a SHA-256 content hash. The hash is stored as promptHash on every trace generated by that prompt — creating a bidirectional link between traces and the exact prompt version that produced them.

Immutable versions: changes never overwrite, full history always preserved
promptHash on every trace: click through from any trace to its prompt version
Labels: tag versions with production, staging, experiment, or custom labels

version history & trace linkage

Zespan prompt management showing version history with labels and content hash

Automatic Regression Detection

When you promote a version to production, a background job compares eval scores for the new version against the previous 14 days. If any evaluator drops more than 10 percentage points, ZespanPilot notifies you with before/after scores per evaluator.

Triggers automatically on every production promotion — no manual step
14-day lookback window for comparison
quality_regression notification with per-evaluator before/after scores

automatic regression detection

Zespan prompt detail showing eval score comparison across versions

AI Prompt Optimization

The optimizer analyzes your traces and suggests specific improvements: model switch, prompt compression, or rewrite. Each suggestion includes projected cost savings and can run as a background job for large prompt sets.

Model switch suggestions: e.g., GPT-4o → GPT-4o-mini for simple tasks
Prompt compression: identifies boilerplate that can be removed
buildDirectPromptSuggestion: immediate improvement for a specific prompt text

Runtime SDK Fetch

PromptClient fetches prompt content at runtime by name + label. Content is cached in Redis and invalidated automatically when a new version is promoted — your app always runs the current production prompt without a redeployment.

Fetch by name + label: prompts.get('support-reply', 'production')
Redis cache: invalidated on configVersion bump from every promotion
No redeployment needed: SDK picks up new prompts automatically

Get started

Set up in under 5 minutes

typescriptPrompt Management

import { PromptClient } from '@zespan/sdk';

// Fetch the current production prompt at runtime — always fresh
const prompts = new PromptClient({ apiKey: process.env.ZESPAN_API_KEY });
const prompt = await prompts.get('support-reply', 'production');

// prompt.content — cached in Redis, invalidated on next version promotion

Start for free →Get a demo

Frequently asked

How does regression detection work exactly?

When you promote a prompt version to the production label, Zespan runs a background job that queries eval scores for traces using the new version vs. traces from the previous 14 days. If any evaluator score drops more than 10 percentage points, a quality_regression ZespanPilot notification fires with the before/after breakdown.

Can I compare two prompt versions side by side?

Yes. The prompt version view shows per-version metrics (eval scores, usage count, latency, cost). You can compare any two versions. The Simulations feature lets you run both versions against the same dataset and compare results head-to-head.

What's the protected label feature?

Certain labels — like production — require admin permission to assign. This prevents developers from accidentally promoting experimental prompts to production without review. You configure which labels are protected in project settings.

Does the SDK cache prompts locally?

Yes. PromptClient caches prompt content in Redis. When you promote a new version, the configVersion bumps and is returned in the next SDK ingest response — triggering SDK clients to refetch. Cache invalidation is automatic and doesn't require your app to restart.

Explore more features

Setup takes under 5 minutes. Works with OpenAI, Anthropic, LangChain, and more.

Get started free →Get a demo

← All features

Ship prompt changes without breaking production.

Version History & Trace Linkage

Automatic Regression Detection

AI Prompt Optimization

Runtime SDK Fetch

How does regression detection work exactly?

Can I compare two prompt versions side by side?

What's the protected label feature?

Does the SDK cache prompts locally?

Tracing

Agent Monitoring

Evaluations

Guardrails