A developer guide for evaluating structured data extraction using LangWatch
"Schiphol, 2 people"
or "Herengracht 500 now"
.pickup_address
, airport_found
, and passenger_count
, even when the input is incomplete.
user_message
and a ground_truth
column containing the expected JSON output.
extract_booking_details()
, that simulates our LLM pipeline. This function takes a user message and returns a JSON object with the extracted details.
This is where you would integrate your actual LLM calls (e.g., using OpenAI, Anthropic, or a local model).
hallucination_check
failed to debug why your model is inventing a destination_address. This level of detail is crucial for iterating on your prompts and improving model reliability.