POST
request to the following endpoint:
type
: The specific type of the event.timestamp
: A Unix timestamp (in milliseconds) of when the event occurred.batchRunId
: An ID that groups all scenarios run within the same test execution or process.scenarioId
: A stable identifier for a specific scenario (e.g., “test_vegetarian_recipe”).scenarioRunId
: A unique ID for a single execution of a scenario.scenarioSetId
: The top-level grouping for a collection of scenarios, which defaults to "default"
.SCENARIO_RUN_STARTED
metadata
:
name
: The display name of the scenario.description
: A longer description of what the scenario tests.SCENARIO_MESSAGE_SNAPSHOT
messages
: An array of message objects. The schema for these messages (user, assistant, tool, etc.) is detailed in the OpenAPI specification.SCENARIO_RUN_FINISHED
status
: The final status of the run (SUCCESS
, FAILED
, ERROR
, etc.).results
: An object containing the final verdict from a Judge Agent, including:
verdict
: The final outcome (success
, failure
).reasoning
: The explanation for the verdict.metCriteria
: A list of criteria that were satisfied.unmetCriteria
: A list of criteria that were not met.