.tabs .overflow-x-auto {
  overflow-x: visible !important;
}

a[href="https://github.com/langwatch/langwatch"] .text-sm
{
  white-space: nowrap;
}

[aria-controls~="accordion"] {
  padding-right: 8px;
  padding-left: 8px;
  align-items: start;
}

[aria-controls~="accordion"] p:last-child {
  font-size: 14px;
  margin-top: 0.5rem;
}

[aria-controls~="accordion"] [class="mr-0.5"] {
  margin-top: 5px;
}

[aria-controls~="accordion"][aria-expanded="true"] [class="mr-0.5"] {
  margin-top: 8px;
}

.line-highlight {
  background: rgb(85 215 153 / 0.2) !important;
}

.line-highlight:before {
  background: rgb(85 215 153 / 0.2) !important;
  border-left: 4px solid rgb(85 215 153) !important;
}

.line-highlight:after {
  background: rgb(85 215 153 / 0.2) !important;
}


Test Report

Individual Run View

LangWatch

Welcome to LangWatch, the all-in-one [open-source](https://github.com/langwatch/langwatch) LLMops platform.

Introduction

Introduction to Agent Testing

Overview

Getting Started

Simulation Sets

Batch Runs

Easily integrate LangWatch with your Python, TypeScript, or REST API projects.

LLM tracing and observability conceptual guide

Concepts

Quick Start

Examples of LangWatch integrated applications

Code Examples

Overview of LLM evaluation features in LangWatch

LLM Evaluation Overview

Evaluate and visualize your LLM evals with LangWatch

Evaluation Tracking API

List of Evaluators

Collaborate with domain experts using annotations

Annotations

Use an agent to debug your LLM applications and fix the issues for you

LangWatch MCP

LangWatch MCP Server

Manage and version your prompts for consistent AI interactions

Prompt Versioning

Troubleshooting and Support

Status Page

LangWatch offers a fully self-hosted version of the platform for companies that require strict data control and compliance.

Setup for Azure AD

LangWatch offers a hybrid setup for companies that require strict data control and compliance.

Elasticsearch Setup for LangWatch Hybrid Deployment

Elasticsearch

S3 Storage Setup for LangWatch Hybrid Deployment

S3 Storage

SSO Setup for LangWatch Hybrid Deployment

SSO - Azure AD

Discover how to measure the performance of Retrieval-Augmented Generation (RAG) systems using metrics like retrieval precision, answer accuracy, and latency.

Measuring RAG Performance

Learn how to optimize embedding models for better retrieval in RAG systems—covering model selection, dimensionality, and domain-specific tuning.

Optimizing Embeddings

Learn the key differences between vector search and hybrid search in RAG applications. Use cases, performance tradeoffs, and when to choose each.

Vector Search vs Hybrid Search

Understand how to evaluate tools and components in your RAG pipeline—covering retrievers, embedding models, chunking strategies, and vector stores.

Evaluating Tool Selection

Capture the RAG documents used in your LLM pipelines

RAG Context Tracking

LangWatch is the best observability integration for Langflow

Langflow Integration

Capture LLM traces and send them to LangWatch from Flowise

Flowise Integration

Integrate LangWatch with any language by using the REST API

REST API Integration

Track user interactions with your LLM applications

Be alerted when something goes wrong and trigger actions automatically

Alerts and Triggers

Build and integrate LangWatch graphs on your own systems and applications

Exporting Analytics

Measuring your LLM performance with Offline Evaluations

How to evaluate that your LLM answers correctly

Measuring your LLM performance using an LLM-as-a-judge

How to evaluate when you don't have defined answers

How to evaluate an LLM when you don't have defined answers

Setting up Real-Time Evaluations

Add your own evaluation results into LangWatch trace

Instrumenting Custom Evaluator

Create and manage datasets with LangWatch

Datasets

Bootstrap your evaluations by generating sample data

Generating a dataset with AI

Continuously populate your datasets with comming data from production

Automatically build datasets from real-time traces

Create, evaluate, and optimize your LLM workflows

Optimization Studio

LLM Nodes

Define the data used for testing and optimization

Measure the quality of your LLM workflows

Evaluating

Find the best prompts with DSPy optimizers

Optimizing

Visualize your DSPy notebooks experimentations to better track and debug the optimization process

Get Started

Agent Simulations

LLM Observability

LLM Evaluation

LLM Development

API Endpoints

Support

Individual Run View

Test Report

Get Started

Agent Simulations

LLM Observability

LLM Evaluation

LLM Development

API Endpoints

Support

​Test Report

Test Report