Hyperparam, the workbench for LLM datasets
Inspect, curate, and compare LLM datasets directly in the browser
Explore chat logs, agent traces, tool calls, and other LLM outputs using natural language queries. Fast, intuitive, and built for real AI workflows.
Hyperparam runs entirely client-side, without servers or heavy infrastructure.
Live Demo: All of Wikipedia in the Hyperparam Viewer
This is a demo of what you can do with parquet files being read directly in the browser. A parquet file with all of the English Wikipedia articles is stored in S3 and rows are retrieved on demand using hyparquet.
Debug failures across real-world LLM outputs
- Open multi-gigabyte Parquet or JSONL files directly in the browser.
- Inspect full LLM datasets to understand how models behave across real production inputs.
- Apply LLM-based scoring and filtering to surface failures and regressions.
- Compare outputs before and after prompt, tool, or model changes.
- Export curated subsets, no backend needed.
More insight with less effort
Use AI-assisted scoring, labeling, and filtering to surface patterns, identify failures, and understand LLM behavior across entire datasets. Use LLM-as-a-judge to validate updates consistently and at scale.
Performance and security start in your browser
Run everything directly in your browser for fast, responsive workflows and a local-first approach that avoids unnecessary infrastructure.
Bring the file formats you already work with
Work with Parquet, JSONL, or CSV files, and export curated datasets in the same formats for downstream evaluation or training workflows.
Scroll billions of rows like it's nothing
Advanced virtualization lets you scroll, filter, and inspect multi-gigabyte LLM datasets smoothly, even at very large row counts.
Query your data the way you think
Use natural-language queries to explore, filter, and compare LLM outputs without writing SQL or custom scripts.
Human-in-the-loop dataset curation
Combine AI-assisted workflows with manual review to make curation decisions visible, inspectable, and reversible at the row level.
If you work with LLM datasets, Hyperparam is your AI workbench
- AI engineers — Debug failures, evaluate changes, and compare outputs before shipping.
- Product teams — Review conversations, identify failure patterns, and assess user experience impact.
- Data scientists — Extract insight from unstructured text data at scale.
- ML researchers — Prepare and validate datasets for training and evaluation.
Watch a DemoFAQ
What makes Hyperparam different from other dataset tools?
Hyperparam was built on a browser-first architecture. That means it's designed for interactive work on very large datasets so you can inspect LLM logs, trace behavior and filter and score rows without defaulting to notebooks or scripts. It also lets you capture the workflows you build as skills, then rerun them later on new data.
How does LLM-as-a-judge work in Hyperparam?
Hyperparam can score model outputs using LLM-as-a-judge across the full dataset. The resulting scores appear at row level so you can filter responses, analyze patterns and identify low-quality outputs in production logs.
Do I need to install anything?
No. Hyperparam runs in the browser, so you can open the app and load a dataset to begin inspection.
Can Hyperparam handle large LLM logs?
Yes. It is built for interactive inspection of multi-gigabyte datasets, including large LLM logs and model outputs.
Can I export curated datasets?
Yes. You can export filtered, scored or labeled datasets as Parquet, JSONL, or CSV files for downstream analysis or training workflows.
Does Hyperparam cost anything?
Hyperparam is free while it's in beta. After that, AI-assisted features may require usage-based billing depending on dataset size and analysis volume.
Try it now
Sign in now for early access.