Datasets are at the core of the Optimization Studio’s functionality. When working with non-deterministic systems like LLMs, running your tests across multiple examples is crucial for confidence in your results. While you might get lucky with a single successful test, running your LLM against hundreds of examples provides much more reliable validation of your solution.The good news is that you don’t need an enormous dataset to get started. As little as 20 examples can already provide meaningful results with the DSPy optimizers, thanks to their intelligent use of LLM capabilities.
If you already use LangWatch for monitoring, you can import the production data generated by your LLMs as a dataset, otherwise, you can also create or import a new dataset on optimization studio directly.
While detailed evaluation is covered in later tutorials, the basic workflow involves:
Clicking the Evaluate button
Documenting changes made to your pipeline
Selecting which dataset partition to evaluate against
Adding necessary LLM API keys
The evaluation panel provides:
Total entries processed
Average cost per entry
Total runtime
Overall experiment costs
This foundation in dataset management sets you up for evaluating the quality and running automated optimizations, which are covered in subsequent tutorials.