New Introducing HoneyHive v2

The observability layer for enterprise AI

HoneyHive unifies observability and evaluation into a continuous improvement loop, so every team can ship quality agents with confidence.

Partnering with leading teams.
From AI startups to Fortune 500 enterprises.

Distributed Tracing

See inside any agent, any framework, anywhere

Instrument any agent on any stack with OpenTelemetry — and give every team one shared source of truth to debug agent behavior.

OpenTelemetry-native. Works across 100+ LLMs & agent frameworks.
Online Evaluation. Run live evals to detect failures across agents.
Session Replays. Replay chat sessions in the Playground.
Filters and Groups. Quickly search across millions of traces and find outliers.
Graph and Timeline View. Debug complex multi-agent systems.
User Feedback. Capture implicit and explicit signals from your users.
Monitoring & Alerts

Continuously evaluate agent quality at scale

Continuously evaluate live traces, monitor real-world feedback from users, and alert on failures modes that matter to your business.

Online Evaluation. Use LLM-as-a-judge or custom code to evaluate live traces.
Alerts and Drift Detection. Detect agent failures before your users do.
Automations. Add failing prompts to datasets or trigger human review.
Custom Dashboard. Get quick insights into the metrics that matter.
Discover. Find patterns across millions of traces with exploratory analytics.
Root-Cause with AI. Give your coding agents context to root cause and fix issues.
Experiments

Confidently ship changes with evals

Turn production failures into test suites, compare new changes with baseline, and catch regressions before every release.

Experiments. Test your agents offline against large datasets.
Datasets. Centrally manage test cases with domain experts.
Custom Evaluators. Write your own LLM-as-a-judge or code evaluators.
Human Review. Allow domain experts to grade outputs.
Regression Detection. Identify critical regressions as you iterate.
CI/CD Integration. Run automated test suites over every commit.
Annotation Queues

Shape agent quality with expert feedback

Bring subject matter experts into the loop to review edge cases, define quality, and align your evals with real-world business context.

Queue Automation. Route flagged traces to the right reviewers.
Human Review. Bring domain experts into the loop in a friendly interface.
Custom Rubrics. Standardize review with business-specific criteria.
Dataset Curation. Git-native versioning across artifacts.
Audit Trail. Capture expert feedback alongside trace context.
Evaluator Alignment. Use feedback to align LLM evaluators with SMEs.
Integrations

Open standards, open ecosystem

OpenTelemetry-native and framework-agnostic. Get end-to-end visibility across every model, framework, and agent deployed within your company.

Trusted by Fortune 500 enterprises

Scaling AI agents responsibly at Australia's largest bank

HoneyHive powers observability and evaluation across mission-critical AI systems at CBA, enabling safe and responsible deployment of AI agents serving 17M+ consumers.

Under the hood

Engineered for scale. Private by default.

Agent traces carry rich, highly sensitive I/O that traditional observability can’t handle. HoneyHive is designed specifically for AI traces and keeps your data isolated — logically by default, physically when you need to.

HONEYHIVE v2

Virtual data planes. Logically isolated by default.

SaaS · we run both planes in HoneyHive cloud. Each tenant is isolated in its own virtual data plane with dedicated security and governance rules. Fastest to start. No infra to operate.

HoneyHive cloudControl plane · multi-tenant
UI & dashboards Metrics & analytics Metadata catalog Access control
What we seeMetadata and metrics. Never the payloads they describe.
↕   mTLS · non-PII analytics only · signed manifests · OIDC
HoneyHive cloudVirtual data plane · isolated per tenant
Raw traces Sensitive payloads Eval compute Datasets
Isolated per tenantLogical separation with tenant-scoped security and governance rules.
↕   OTLP · customer-owned CMK · SIEM forwarding
Your stackApplications
Frameworks Models Gateways Coding agents
Agents in prodAny model · any framework
Granular RBAC

Isolate projects and workspaces and define custom roles across dozens of granular permissions.

SSO & SAML

Okta, Azure AD, Google, PingSSO. JIT provisioning, enforced MFA, and session policies managed by your IdP.

SOC 2 · GDPR · HIPAA

Audited to SOC 2 Type II. GDPR-compliant with EU data residency. HIPAA BAA available for SaaS customers.

Audit Logging

Stream audit logs to Splunk, Datadog, or any SIEM. Every access, change, and export is auditable upstream.

AI-native

Loved by developers and coding agents

Ready-made skills, a full-feature CLI, and a docs MCP server mean your coding agents can set up tracing, write evals, and drive improvements for you.

honeyhive CLI

Full API access from the terminal. Let coding agents manage HoneyHive for you, or script your workflows in GitHub Actions.

$ honeyhive metrics create --name faithfulness --type LLM --criteria "Is the answer grounded in the provided context?"
Docs MCP

Real-time doc search from your IDE. One config line for Cursor, Claude Code, VS Code, Windsurf, Codex, and more.

$ claude mcp add --transport http honeyhive-docs https://docs.honeyhive.ai/mcp
SKILL.md

Ready-made skills for your coding agent. Set up tracing and evals, root-cause prod alerts, categorize failures, and more — all using natural language.

Claude Codev2.1.97
Welcome back Mohak!
Opus 4.6 · Claude Pro
~/repos/local/honeyhive-cli
Tips for getting started
Ask Claude to create a new app or clone a repo…
Recent activity
No recent activity
Install the HoneyHive tracing skill from github.com/honeyhive/skills and use it to add tracing to this agent.
* Proofing…
esc to interrupt

Start your AI observability journey