Pricing should never get in your way. This is why HoneyHive is free forever for individual developers and researchers.
Free
No credit card required
Get started10k events per month
365d log retention
Up to 2 users
Full evaluation and observability suite
Starting $99 / month
Perfect for small teams
Talk to founders50k+ events per month
Data exports
Unlimited users
Email support
Contact us
Ideal for large organizations
Talk to foundersCustom usage limits
SSO & SAML
VPC hosting add-on
Dedicated support and SLA
An event refers to a single trace span, structured log, or metric label combination sent to our API as OTLP or JSON. It captures any relevant data from your system, including all context fields generated by your application's instrumentation. Each event can be up to 256KB in size and can contain any number of properties.
Automated Evaluators: An automated evaluator is a function (code or LLM) that helps you unit test any arbitrary event or combinations of events to generate a measurable score. Common examples of evaluators include Context Precision, ROUGE, Coherence, BERT Score, and more. We provide many common evaluators out-of-the-box and allow defining custom evaluators within the platform.
Human Evaluators: We strongly encourage a hybrid-evaluation approach, i.e. combining automated techniques with human evaluation. This helps you account for metric bias and better align your evaluators with your domain experts' scoring rubric. To enable this, you can define custom scoring rubrics in HoneyHive for graders to use when evaluating traces.
HoneyHive allows you to filter and curate datasets from your production logs. These datasets can be annotated by domain experts within the platform and exported programmatically for fine-tuning open-source models.
You can export datasets curated within HoneyHive using our SDK and use your preferred GPU cloud and optimization method (such as SFT, DPO, etc.) to fine-tune custom models. You can optionally also build active learning pipelines using our SDK to periodically export logs and run fine-tuning and validation jobs with your preferred fine-tuning providers. Contact us to learn more.
All data is secure and encrypted in transit and at rest, managed by AWS and Clickhouse Cloud. We conduct regular penetration tests, are currently undergoing SOC-2 audit, and provide flexible hosting solutions (cloud-hosted or VPC) to meet your security and compliance needs.
By default, we do not proxy your requests via our servers. Instead, we store prompts as configurations, which can be deployed and used in your application logic using the GET Configuration API endpoint.
You can log traces and any batch evaluation runs using our tracers and API endpoints, or async via our batch ingestion API endpoint. We offer native SDKs in Python and Typescript with OpenTelemetry support, and provide additional integrations with popular frameworks like LangChain and LlamaIndex.
We use OpenTelemetry (OTEL) to auto-instrument applications in Python and Typescript.
For users using other languages, you can ingest your OpenTelemetry traces to our OTEL endpoint or manually instrument your application using our APIs.