Capture the RAG documents used in your LLM pipelines
Retrieval Augmented Generation (RAGs) is a common way to augment the generation of your LLM by retrieving a set of documents based on the user query and giving it to the LLM to use as context for answering, either by using a vector database, getting responses from an API, or integrated agent files and memory.
It can be challenging, however, to build a good quality RAG pipeline, making sure the right data was retrieved, preventing the LLM from hallucinating, monitor which documents are the most used and keep iterating to improve it, this is where integrating with LangWatch can help, by integrating your RAG you unlock a series of Guardrails, Measurements and Analytics for RAGs LangWatch.
To capture a RAG span, you can use the @langwatch.span(type="rag") decorator, along with a call to .update() to add the contexts to the span:
@langwatch.span(type="rag")def rag_retrieval(): # the documents you retrieved from your vector database search_results = ["France is a country in Europe.", "Paris is the capital of France."] # capture them on the span contexts before returning langwatch.get_current_span().update(contexts=search_results) return search_results
If you have document or chunk ids from the results, we recommend you can to capture them along with the id using RAGChunk, as this allows them to be grouped together and generate documents analytics on LangWatch dashboard:
from langwatch.types import RAGChunk@langwatch.span(type="rag")def rag_retrieval(): # the documents you retrieved from your vector database search_results = [ { "id": "doc-1", "content": "France is a country in Europe.", }, { "id": "doc-2", "content": "Paris is the capital of France.", }, ] # capture then on the span contexts with RAGChunk before returning langwatch.get_current_span().update( contexts=[ RAGChunk( document_id=document["id"], content=document["content"], ) for document in search_results ] ) return search_results
Then you’ll be able to see the captured contexts that will also be used later on for evaluatios on LangWatch dashboard:
To capture a RAG span, you can use the @langwatch.span(type="rag") decorator, along with a call to .update() to add the contexts to the span:
@langwatch.span(type="rag")def rag_retrieval(): # the documents you retrieved from your vector database search_results = ["France is a country in Europe.", "Paris is the capital of France."] # capture them on the span contexts before returning langwatch.get_current_span().update(contexts=search_results) return search_results
If you have document or chunk ids from the results, we recommend you can to capture them along with the id using RAGChunk, as this allows them to be grouped together and generate documents analytics on LangWatch dashboard:
from langwatch.types import RAGChunk@langwatch.span(type="rag")def rag_retrieval(): # the documents you retrieved from your vector database search_results = [ { "id": "doc-1", "content": "France is a country in Europe.", }, { "id": "doc-2", "content": "Paris is the capital of France.", }, ] # capture then on the span contexts with RAGChunk before returning langwatch.get_current_span().update( contexts=[ RAGChunk( document_id=document["id"], content=document["content"], ) for document in search_results ] ) return search_results
Then you’ll be able to see the captured contexts that will also be used later on for evaluatios on LangWatch dashboard:
When using LangChain, generally your RAG happens by calling a Retriever.
We provide a utility langwatch.langchain.capture_rag_from_retriever to capture the documents found by the retriever and convert it into a LangWatch compatible format for tracking. For that you need to pass the retriever as first argument, and then a function to map each document to a RAGChunk, like in the example below:
import langwatchfrom langwatch.types import RAGChunk@langwatch.trace()def main(): retriever = ... retriever_tool = create_retriever_tool( langwatch.langchain.capture_rag_from_retriever( retriever, lambda document: RAGChunk( document_id=document.metadata["source"], content=document.page_content ), ), "langwatch_search", "Search for information about LangWatch. For any questions about LangWatch, use this tool if you didn't already", ) tools = [retriever_tool] model = ChatOpenAI(streaming=True) prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a helpful assistant that only reply in short tweet-like responses, using lots of emojis and use tools only once.\n\n{agent_scratchpad}", ), ("human", "{question}"), ] ) agent = create_tool_calling_agent(model, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True) return executor.invoke(user_input, config=RunnableConfig( callbacks=[langwatch.get_current_trace().get_langchain_callback()] ))
Alternatively, if you don’t use retrievers, but still want to capture the context for example from a tool call that you do, we also provide a utility langwatch.langchain.capture_rag_from_tool to capture RAG contexts around a tool. For that you need to pass the tool as first argument, and then a function to map the tool’s output to RAGChunks, like in the example below:
import langwatchfrom langwatch.types import RAGChunk@langwatch.trace()def main(): my_custom_tool = ... wrapped_tool = langwatch.langchain.capture_rag_from_tool( my_custom_tool, lambda response: [ RAGChunk( document_id=response["id"], # optional chunk_id=response["chunk_id"], # optional content=response["content"] ) ] ) tools = [wrapped_tool] # use the new wrapped tool in your agent instead of the original one model = ChatOpenAI(streaming=True) prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a helpful assistant that only reply in short tweet-like responses, using lots of emojis and use tools only once.\n\n{agent_scratchpad}", ), ("human", "{question}"), ] ) agent = create_tool_calling_agent(model, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True) return executor.invoke(user_input, config=RunnableConfig( callbacks=[langWatchCallback] ))
Then you’ll be able to see the captured contexts that will also be used later on for evaluatios on LangWatch dashboard:
To capture a RAG, you can simply start a RAG span inside the trace, giving it the input query being used:
const ragSpan = trace.startRAGSpan({ name: "my-vectordb-retrieval", // optional input: { type: "text", value: "search query" },});// proceed to do the retrieval normally
Then, after doing the retrieval, you can end the RAG span with the contexts that were retrieved and will be used by the LLM: