Zespan is an AI agent observability and engineering platform. It traces every agent decision, tool call, handoff, and delegation in production. It also provides prompt versioning, built-in LLM-as-judge evaluations, guardrails, cost optimization, and an AI ops assistant called ZespanPilot.

How do I instrument my AI agent with Zespan?

Zespan requires 2 lines of code. Import zespan and call zespan.init({ apiKey: process.env.LT_KEY }). This auto-patches OpenAI, Anthropic, Gemini, Bedrock, and Mistral. For framework-level tracing, add one handler: ZespanCallbackHandler for LangChain, ZespanCrewAIListener for CrewAI, or ZespanADKHandler for Google ADK.

Does Zespan support prompt versioning?

Yes. Zespan includes prompt management with versioning, a playground for iteration, and A/B testing to compare prompt versions against each other in production.

What evaluations does Zespan support?

Zespan ships 12 built-in LLM-as-judge evaluation templates including faithfulness, relevance, toxicity, groundedness, and more. Evaluations run automatically on every trace with no custom scoring functions required.

How does Zespan compare to Langfuse?

Zespan is agent-native: every span carries agent identity, delegations are first-class trace events, and an agent map is built automatically. Langfuse was built for LLM pipelines and extended to agents later. Zespan also ships 12 built-in eval templates (Langfuse has none), includes an AI cost optimizer, and ZespanPilot for AI ops. Langfuse has open-source self-hosting; Zespan does not.

What is the free tier for Zespan?

The free tier includes 10,000 traces per month, 14-day retention, 2 projects, and 1 seat. No credit card required.

Groq Integration

Drop-in wrapper for the Groq SDK. Every Llama, Mixtral, and Gemma inference call is traced with accurate sub-second latency measurement.

Groq provides ultra-low-latency inference for open models (Llama 3, Mixtral, Gemma, Whisper) via a dedicated LPU inference engine.

zespan.com — groq trace

Zespan performance view showing Groq sub-second latency distribution with P50/P95/P99

Getting Started

Install the SDK

Install @zespan/sdk alongside groq-sdk.

bash

npm install @zespan/sdk groq-sdk

Wrap the Groq client

Pass your Groq client to wrapGroq(). All chat completion calls are traced with accurate latency measurement.

typescript

import Groq from 'groq-sdk';
import { Zespan, wrapGroq } from '@zespan/sdk';

const lt = new Zespan({ apiKey: process.env.ZESPAN_API_KEY });
const groq = wrapGroq(new Groq(), lt);

const completion = await groq.chat.completions.create({
  model: 'llama-3.3-70b-versatile',
  messages: [{ role: 'user', content: 'Hello' }],
});

View traces in Zespan

Open Trace Explorer. Groq calls appear with model, token usage, cost, and latency. The Performance view shows Groq's sub-second latency distribution.

What's captured automatically

Accurate sub-second latency: time-to-first-token and total latency measured precisely
All Groq models: Llama 3, Mixtral, Gemma, Whisper, and future additions
Model comparison: compare cost and latency across Groq models in Cost Attribution
Rate limit tracking: 429 errors attributed with full context
Audio transcription: Whisper calls traced alongside chat completions

FAQ

Does Zespan capture Groq's fast time-to-first-token accurately?

Yes. The SDK measures time from request start to first token received, which is the meaningful latency number for Groq's streaming completions.

Can I compare Groq vs OpenAI latency in Zespan?

Yes. Filter the Performance view by provider and compare P50/P95/P99 latency across providers and models side by side.

Start for free — 10K traces/month, no card needed

Groq integration works on all plans including the free tier.

Get started free →Get a demo

← All integrations