Quickstart
1. Install the Python library
2. Login to LangWatch
Import and authenticate the LangWatch SDK:3. Start tracking

Core Concepts
Evaluation Initialization
The evaluation is started by creating an evaluation session with a descriptive name:Loop wrapping
Useevaluation.loop()
around your iterator so the entries are tracked:
Metrics logging
Track any metric you want withevaluation.log()
:
Use LangWatch Datasets
Collaborate with your team using datasets stored in LangWatch:Create and manage datasets in the LangWatch UI. See our Datasets Overview for more details.
Capture Full Pipeline
Add Custom Data
Beyond just metrics, you can capture outputs and other relevant data for analysis:Trace Your LLM Pipeline
To get complete visibility into your LLM pipeline, trace your agent with the@langwatch.trace()
decorator:
With tracing enabled, you can click through from any evaluation result to see the complete execution trace, including all LLM calls, prompts, and intermediate steps.Learn more in our Python Integration Guide.
Parallel Execution
LLM calls can be slow. To speed up your evaluations, you can use the built-in parallelization, by putting the content of the loop in a function and submitting it to the evaluation for parallel execution:By default,
threads=4
. Adjust based on your API rate limits and system resources.Built-in Evaluators
LangWatch provides a comprehensive suite of evaluation metrics out of the box. Useevaluation.run()
to leverage pre-built evaluators:
Browse our complete list of available evaluators including metrics for RAG quality, hallucination detection, safety, and more.
Complete Example
Here’s a full example combining all the features:What’s Next?
- View Evaluators - Explore all available evaluation metrics
- Python Integration - Set up comprehensive tracing
- Datasets - Learn about dataset management