Ship reliable, testable agents – not guesses. Better Agents adds simulations, evaluations, and standards on top of any framework. Explore Better Agents
How to evaluate an LLM when you don't have defined answers
Measuring your LLM performance using an LLM-as-a-judge
For some AI applications, it’s not really possible to define a golden answer, this happens for example in creative tasks, where it’s hard to define a single correct answer.On the video below, we show how to use the LangWatch Evaluation Wizard to evaluate a Business Coaching Agent, where we don’t have defined answers, but we can use an LLM-as-a-judge to evaluate the quality of the answers: