Evaluation by Thread

With LangWatch, you can evaluate your LLM applications by thread. This approach is useful for analyzing the performance of your LLM applications across entire conversation threads, helping you identify which threads are performing well or poorly. To set up evaluation by thread, toggle the thread-based mapping option when creating an evaluation.

This enables thread-based evaluation where each time a trace is evaluated, the full thread context is retrieved and passed to the evaluation function. This approach builds upon the complete conversation thread rather than individual traces. By default, we include the trace INPUT and OUTPUT fields in the evaluation. You can add additional fields to the evaluation by including them in your dataset.

Instrumenting Custom Evaluator List of Evaluators

⌘I

Get Started

Agent Simulations

Observability

Evaluation

Prompt Management

Platform

Examples & Cookbooks

Evaluation by Thread