Pricing should never get in your way. This is why HoneyHive is free forever for individual developers and researchers.
Free
10k events per month
365d log retention
2 users only
Full evaluation and observability suite
Starting $99 / month
50k or 500k events per month
Data exports
Unlimited users
Increased rate limits
An event refers to a single trace span, structured log, or metric label combination sent to our API as OTLP or JSON. It captures any relevant data from your system, including all context fields generated by your application's instrumentation. Each event can be up to 256KB in size and can contain any number of values.
Automated Evaluators: An automated evaluator is a function (code or LLM) that helps you unit test any arbitrary event or combinations of events to generate a measurable score. Common examples of evaluators include Context Precision, ROUGE, Coherence, BERT Score, and more. We provide many common evaluators out-of-the-box and allow defining custom evaluators within the platform.
Human Evaluators: We strongly encourage a hybrid-evaluation approach, i.e. combining automated techniques with human evaluation. This helps you account for metric bias and better align your evaluators with your domain experts' scoring rubric. To enable this, you can define custom scoring rubrics in HoneyHive for graders to use when evaluating traces.
HoneyHive allows you to filter and curate datasets from your production logs. These datasets can be annotated by domain experts within the platform and exported programmatically for fine-tuning open-source models.
You can export datasets curated within HoneyHive using our SDK and use your preferred GPU cloud and optimization method (such as SFT, DPO, etc.) to fine-tune custom models. You can optionally also build active learning pipelines using our SDK to periodically export logs and run fine-tuning and validation jobs with your preferred fine-tuning providers. Contact us to learn more.
All data is secure and encrypted in transit and at rest, managed by AWS. We conduct regular penetration tests, are currently undergoing SOC-2 audit, and provide flexible hosting solutions (cloud-hosted or in your VPC) to meet your security and privacy needs.
By default, we do not proxy your requests via our servers. Instead, we store prompts as configurations, which can be deployed and used in your application logic using the GET /Configuration API endpoint.
You can log traces and any batch evaluation runs using our tracers and API endpoints, or async via our batch ingestion API endpoint. We offer native SDKs in Python and Typescript with OpenTelemetry support, and provide additional integrations with popular frameworks like LangChain and LlamaIndex.
We use OpenTelemetry (OTEL) to auto-instrument applications in Python and Typescript.
For users using other languages, you can ingest your OpenTelemetry traces to our OTEL endpoint or manually instrument your application using our APIs.