
scenarioId
, allowing you to see how its behavior has changed over time across different batches. This is invaluable for tracking regressions or improvements.
Test Report
At the bottom of the conversation, you’ll find the Scenario Test Report. This block provides a summary of the scenario’s execution and its final outcome.
- Status: The final result of the run (e.g., PASSED, FAILED).
- Success Criteria: The total number of criteria that were met.
- Duration: The total time the scenario took to execute.
- Met Criteria: A list of the specific evaluation criteria that were satisfied.
- Reasoning: The explanation provided by the Judge Agent for its final verdict.