Better Agents
Better Agents is a CLI tool and a set of standards for building reliable, testable, production-grade agents, independent of which framework you use. It supercharges your coding assistant (Kilocode, Claude Code, Cursor, etc.), making it an expert in any agent framework you choose (Agno, Mastra, LangGraph, etc.) and all their best practices. Use your preferred stack—Agno, Mastra, Vercel AI, Google ADK, or anything else. Better Agents doesn’t replace your stack, it stabilizes it.Already have a project? Add evaluations, observability, and scenarios to your existing agent project. See the Integration Guide to get started.
Quick Start
Installation
Install Better Agents globally:Initialize a New Project
After installation, create a new Better Agents project:Create Your First Project
After running the init command, navigate to your project:Run Your First Scenario Test
Better Agents projects come with example scenario tests. Run them to see how agent testing works:- Python
- TypeScript
Once you run your first scenario, you’ll see results appear in your LangWatch project dashboard under the Simulations section.
Project Structure
Every Better Agents project follows a tested, scalable, maintainable layout. Here’s what each directory does:app/ or src/
Your actual agent code, written using your chosen framework. This is where you implement your agent’s logic, tools, and workflows.
tests/scenarios/
The core of real agent reliability. These aren’t unit tests—they’re conversational test cases that simulate real tasks and validate agent behavior across iterations, updates, or model swaps.
Scenarios answer the most important question in AI engineering: Does the agent still behave the way we expect?
Example scenario structure:
- Python
- TypeScript
tests/scenarios/example_scenario.test.py
tests/evaluations/
Structured benchmarking for components like RAG correctness, retrieval F1 score, classification accuracy, and routing accuracy. LangWatch provides an extensive library of evaluators including answer correctness, LLM-as-judge, RAG quality metrics, safety checks, and more.
See the complete list of available evaluators in Evaluators List.
tests/evaluations/rag_correctness.ipynb
prompts/
Versioned prompt files in YAML format for team collaboration. Prompts are tracked, shared, and collaboratively improved—like real software.
Example prompt structure:
prompts/customer-support.yaml
prompts.json
Prompt registry that controls which prompts are active and versioned. This file is versioned along with your codebase while also syncing to the LangWatch platform playground for collaboration.
.mcp.json
MCP server configuration that comes with all the right MCPs set up so your coding assistant becomes an expert in your framework of choice and in writing Scenario tests for your agent. It automatically discovers MCP tools and knows where to find new capabilities.
AGENTS.md
Development guidelines that ensure every new feature is properly tested, evaluated, and that prompts are versioned. This file guides your coding assistant to follow Better Agents best practices.
Core Concepts
Scenarios
Scenarios are end-to-end conversational tests that validate agent behavior in realistic, multi-turn conversations. Unlike static input-output tests, scenarios simulate how real users interact with your agent. Why scenarios matter:- Test agent behavior as a complete system
- Catch regressions before they reach production
- Validate complex workflows and edge cases
- Ensure consistency across model updates
For detailed scenario testing documentation, see Agent Simulations.
Evaluations
Evaluations provide structured benchmarking for specific components of your agent pipeline. Examples include:- RAG correctness - Measure retrieval and generation accuracy
- Retrieval F1 score - Evaluate search quality
- Classification accuracy - Test routing and categorization
- Routing accuracy - Validate decision-making logic
Learn more about evaluations in LLM Evaluation.
Prompt Versioning
Prompts are no longer ad-hoc artifacts. With Better Agents, they become:- Tracked - Full version history with easy rollback
- Reviewable - Team collaboration on prompt improvements
- Documented - Clear structure and purpose
- Synced - Controlled by
prompts-lock.json, versioned with codebase, synced to platform
For comprehensive prompt management features, see Prompt Management.
MCP Integration
The.mcp.json configuration enables your coding assistant to understand your agent framework and Better Agents standards.
Learn more about MCP integration in LangWatch MCP.