Feature — Playground

Find prompt failures in the sandbox, not in production.

Test prompts across 4 providers with real streaming, tool calls, structured output, and your actual guardrails — before any code ships.

OpenAI, Anthropic, Google, OpenRouter. Tool calls. Structured output. Guardrail integration.

zespan.com — playground
Zespan Playground
Works withOpenAIAnthropicGoogle GenAIOpenRouterTool callsStructured output

4

providers

100+

models

Live

streaming

4 Providers, 100+ Models

OpenAI (GPT-4o, GPT-4o-mini, o1, o3), Anthropic (Claude 3.5 Sonnet, Claude 3 Opus, Haiku), Google (Gemini 1.5 Pro, Gemini 1.5 Flash), and OpenRouter (100+ models). Available models are fetched dynamically — always current.

  • OpenAI: GPT-4o, GPT-4o-mini, o1, o3-mini, GPT-4-turbo
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
  • Google: Gemini 1.5 Pro, Gemini 1.5 Flash and all Google GenAI models
  • OpenRouter: 100+ models (Llama 3, Mixtral, Yi, DeepSeek, and more)

Tool Calls & Structured Output

Pass tool/function schemas to test tool-calling before wiring up real integrations. Pass a JSON schema to enforce structured output and validate compliance immediately. Catch schema mismatches and tool argument errors before they reach production.

  • Tool definitions: pass function schemas, see tool call arguments and results
  • Structured output: pass a JSON schema — see if the model complies
  • Multiple tool calls: test models that invoke multiple tools in sequence

Streaming & Config Overrides

Stream completions token by token — identical to production streaming behavior. Override temperature, max_tokens, top_p, and any provider-specific parameter to fine-tune behavior in the sandbox.

  • Real-time token streaming — same experience as production
  • Config overrides: temperature, max_tokens, top_p, and provider params
  • Text mode and Chat mode (system/user/assistant/tool message array)

Guardrail Integration

Apply your project's guardrails to Playground runs. The same PII, toxicity, topic boundary, and custom rules that run in production run in the sandbox. Test prompt safety interactively before deploying.

  • applyGuardrails: true — applies all project guardrails to playground runs
  • See block/warn/redact behavior before any prompt reaches production
  • Test guardrail rules against new prompts without a live request

Get started

Set up in under 5 minutes

typescriptPlayground
// Playground is in-product — no SDK setup required.
// Access it from the sidebar: Playground.

// What you can test:
// - Text mode: single string prompt
// - Chat mode: multi-turn message array (system / user / assistant / tool)
// - Tool definitions: pass function schemas to test tool-calling
// - JSON schema output: validate structured output compliance
// - Guardrails: apply project guardrails to sandbox runs

Frequently asked

Do Playground runs appear in my trace data?

Yes. Playground runs are traced like any other LLM call. You can find them in the Trace Explorer filtered by environment=playground or operation=playground-run.

Do I need API keys for each provider?

Yes. Each provider (OpenAI, Anthropic, Google) requires its own API key, which you configure in Project Settings → Providers. Zespan doesn't proxy through its own API keys for providers.

What's the difference between Chat mode and Text mode?

Text mode is a single string prompt — equivalent to a completion or a system prompt. Chat mode is a multi-turn message array with system, user, assistant, and tool roles — equivalent to the chat completions API. Use chat mode to test multi-turn conversations and system prompt behavior.

Can I test a prompt in the Playground before promoting it to production?

Yes, and this is the intended workflow. Load the prompt version from Prompt Management into the Playground, test it with guardrails enabled, and if it passes, promote it to the production label. The Playground is your manual safety check; Simulations are your automated check.

Start free — 10K traces/month, no card needed

Setup takes under 5 minutes. Works with OpenAI, Anthropic, LangChain, and more.