Checks if the SQL query is equivalent to a reference one by using an LLM to infer if it would generate the same results given the table schemas.
Successful evaluation
processed, skipped, error Evaluation score
Whether the evaluation passed
Evaluation label
Additional details about the evaluation
Raw response from the evaluator
Type of error if status is 'error'
Error traceback if status is 'error'