Use Case — RAG

Find where your RAG pipeline breaks — retrieval, reranking, or synthesis.

Hallucinations don't appear from nowhere. They come from bad retrievals, ignored context, or prompts that drift. Zespan shows you exactly which step is failing.

Start for free →For Teams building retrieval-augmented generation systems

The problem

01

Hallucinations with no trail

You can see a bad output but not the retrieved context that caused it. Was it a poor retrieval, a bad prompt, or the model ignoring context? You can't tell.

02

Retrieval quality is a black box

You don't know your average retrieval score, how many chunks are retrieved, or whether reranking actually improves results at scale.

03

Quality degrades silently

Faithfulness scores slip as your document base grows or prompts change. You find out from user reports, not metrics.

How to use Zespan for this

1

Open Trace Explorer — see every RAG pipeline run

In Trace Explorer, filter by operation=rag or by the model your pipeline uses. Every pipeline run is a trace. Click any one to open the span waterfall — query encoding, vector retrieval, reranking, and LLM synthesis each appear as a separate timed span.

zespan.com
Zespan trace explorer filtered to RAG operations showing latency, cost, and status
2

Inspect the span waterfall — see retrieval vs synthesis split

In the span detail view, the waterfall breaks your pipeline into steps. You can see if retrieval is taking 800ms while synthesis takes 200ms, or vice versa. Check retrieved chunk counts and scores per retrieval span to identify when low-score chunks enter the context window.

zespan.com
Zespan span waterfall showing RAG pipeline with retrieval, reranking, and synthesis spans
3

Check Evaluations — faithfulness on every response

Open Evaluations. If you've enabled the faithfulness auto-evaluator, every RAG response has a score: 1.0 means the answer is fully grounded in retrieved context; 0.0 means it came from model memory. Filter to faithfulness < 0.7 to find the responses where the model hallucinated.

zespan.com
Zespan evaluations showing faithfulness and relevance scores trending over time
4

Click into failing evaluations — trace the bad retrieval

In the evaluation detail view, click any low-faithfulness trace. You'll see the user question, the retrieved chunks, the synthesized answer, and the judge's reasoning. The gap between the retrieved context and the answer tells you exactly what broke — bad retrieval, low-quality chunks, or the model ignoring context.

zespan.com
Zespan evaluation detail view showing faithfulness score with retrieved context and judge reasoning
5

Add failing traces to a dataset — regression tests from real failures

Found a trace where retrieval went wrong? Click 'Add to dataset'. Build a dataset of real failures. Next time you change your retrieval config or prompt, run a batch simulation against this dataset — assertions catch regressions before any user sees them.

zespan.com
Zespan trace-to-dataset showing a failing RAG trace added as a regression test case

Start free — 10K traces/month, no card needed

See every agent decision, tool call, and handoff in production. Setup takes under 5 minutes.

Get started free →