Simple pricing, built to scale with your needs.

Pricing should never get in your way. This is why HoneyHive is free forever for individual developers and researchers.


For developers just getting started


Join waitlist

Up to 1,000 user sessions per month

Up to 1 user seat

30d data retention policy

Playground with version control, collaboration, external tools, and 100+ models

Full evaluation and observability suite

Pre-built evaluators for offline and online evaluation

Community and email support

"It's critical to ensure quality and performance across our LLM agents. With HoneyHive, we've not only improved the capabilities of our agents but also seamlessly deployed them to thousands of users — all while enjoying peace of mind."

Divyansh Garg

Co-Founder & CEO, MultiOn

Frequently asked questions
What can HoneyHive do for my company?

HoneyHive provides essential testing and observability tools that help teams test their apps, discover issues, and improve reliability and performance. This helps AI teams build safer, trustworthy, and more reliable AI products that are optimized for production scale.

Our tools will help you iterate faster, evaluate and benchmark performance as you iterate, monitor performance in production, and curate high-quality datasets for fine-tuning and continuous evaluation - all within a unified, collaborative workspace.

What is a user session?

A user session is a trace of the entire sequence of events that occur during a user's interaction with your LLM app.

Each event (i.e. span) in a user session is a log of an API call or a function in your LLM app orchestration that can be stored in HoneyHive (eg: your LLM requests, retrieval step from a vector database/external tool, or any pre/post-processing steps). Both user sessions and any events within a session may also have specific custom properties associated with them (eg: configuration, metadata, user feedback, user properties, and evaluation results) that can be used for deeper analysis and data segmentation.

What is an evaluator?

An evaluator is a function that helps you compute heuristics to measure performance of your LLM app.

We allow users to design their own custom evaluators in Python using popular libraries such as Transformers, Numpy, Scikit Learn, etc. or alternatively use LLMs as a judge to grade specific events within a session or the entire session as a whole.

Evaluators can be used to judge subjective traits like coherence or truthfulness, detect if your agent went off track, check JSON schema validity, and more. This allows you to monitor and evaluate your applications with quantitative rigor and understand precisely where your LLM apps fail.

Evaluators can be defined both on the user session and event level, and are automatically computed as you log data in HoneyHive via any one of our logging methods.

Can HoneyHive help me fine-tune custom models?

HoneyHive allows you to filter and curate datasets from your production logs. These datasets can be annotated by domain experts within the platform and exported programmatically for fine-tuning open-source models.

Team plan users can export their datasets curated within HoneyHive via our SDK and use their preferred fine-tuning provider and optimization method (such as DPO, KTO, etc.) to fine-tune custom, open-source models. For enterprise users, we provide custom 1-click fine-tuning integrations with 3rd party providers.

Is my data secure? 

All data is secure, encrypted, and private to your tenant. We conduct regular penetration tests, are currently undergoing SOC-2 audit and provide flexible hosting solutions (VPC and on-prem) to meet your security and privacy needs.

Does HoneyHive proxy requests?

By default, we do not proxy your requests via our servers. That said, we do provide an optional proxy for teams looking to manage their prompts via HoneyHive. This proxy can be hosted via HoneyHive or within your private cloud environment.

How do I log my data? 

You can log your production requests or any evaluation runs in real-time using our logging endpoints and proxy, or async via our batch ingestion endpoints. We offer native SDKs in Python and Typescript, and provide additional integrations with popular open-source orchestration frameworks like Langchain and LlamaIndex.

For Enterprise customers, we also offer support in additional languages like Go, Java, and Rust via our API endpoints.

Our distributed tracing architecture generalizes across multiple orchestration frameworks (LlamaIndex, Langchain, AutoGen, etc.), models, and hosting environments (cloud, local, on-prem). This allows you to trace any LLM app, no matter how complex or custom your application is.

How long does it take to integrate the SDK?

Integrating the SDK with your application can take anywhere from a few minutes to a couple hours, depending on the complexity of your application and your orchestration framework.

If you're currently using Langchain and LlamaIndex, you can get started in under 5 minutes with our 1-click LlamaIndex integration and Langchain tracer.

For Team and Enterprise plan users, our team is happy to provide hands-on support and instrumentation advice to get your team set up.

Ship reliable AI products that your users trust