The azure-ai-inference Python SDK provides a unified way to interact with various AI models deployed on Azure, including those on Azure OpenAI Service, GitHub Models, and Azure AI Foundry Serverless/Managed Compute endpoints. For more details on the SDK, refer to the official Azure AI Inference client library documentation.

LangWatch can capture traces generated by the azure-ai-inference SDK by leveraging its built-in OpenTelemetry support. This guide will show you how to set it up.

Prerequisites

  1. Install LangWatch SDK:

    pip install langwatch
    
  2. Install Azure AI Inference SDK with OpenTelemetry support: The azure-ai-inference SDK can be installed with OpenTelemetry capabilities. You might also need the core Azure OpenTelemetry tracing package.

    pip install azure-ai-inference[opentelemetry] azure-core-tracing-opentelemetry
    

    Refer to the Azure SDK documentation for the most up-to-date installation instructions.

Instrumentation with AIInferenceInstrumentor

The azure-ai-inference SDK provides an AIInferenceInstrumentor that automatically captures traces for its operations when enabled. LangWatch, when set up, will include an OpenTelemetry exporter that can collect these traces.

Here’s how to instrument your application:

import langwatch
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.tracing import AIInferenceInstrumentor
from azure.core.credentials import AzureKeyCredential
import os
import asyncio

# 1. Initialize LangWatch
langwatch.setup(
    instrumentors=[AIInferenceInstrumentor()]
)

# 2. Configure your Azure AI Inference client
azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_openai_api_version = "2024-06-01"

chat_client = ChatCompletionsClient(
    endpoint=azure_openai_endpoint,
    credential=AzureKeyCredential(azure_openai_api_key),
    api_version=azure_openai_api_version
)

@langwatch.trace(name="Azure AI Inference Chat")
async def get_ai_response(prompt: str):
    # This call will now be automatically traced by the AIInferenceInstrumentor and
    # captured by LangWatch as a span within the "Azure AI Inference Chat" trace.
    response = await chat_client.complete(
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    user_prompt = "What is the Azure AI Inference SDK?"

    try:
        ai_reply = await get_ai_response(user_prompt)
        print(f"User: {user_prompt}")
        print(f"AI: {ai_reply}")
    except Exception as e:
        print(f"An error occurred: {e}")


if __name__ == "__main__":
    asyncio.run(main())

The example uses the synchronous ChatCompletionsClient for simplicity in demonstrating instrumentation. The azure-ai-inference SDK also provides asynchronous clients under the azure.ai.inference.aio namespace (e.g., azure.ai.inference.aio.ChatCompletionsClient). If you are using async/await in your application, you should use these asynchronous clients. The AIInferenceInstrumentor will work with both synchronous and asynchronous clients.

How it Works

  1. langwatch.setup(): Initializes the LangWatch SDK, which includes setting up an OpenTelemetry trace exporter. This exporter is ready to receive spans from any OpenTelemetry-instrumented library in your application.
  2. AIInferenceInstrumentor().instrument(): This command, provided by the azure-ai-inference SDK, patches the relevant Azure AI clients (like ChatCompletionsClient or EmbeddingsClient) to automatically create OpenTelemetry spans for their operations (e.g., a complete or embed call).
  3. @langwatch.trace(): By decorating your own functions (like get_ai_response in the example), you create a parent trace in LangWatch. The spans automatically generated by the AIInferenceInstrumentor for calls made within this decorated function will then be nested under this parent trace. This provides a full end-to-end view of your operation.

With this setup, calls made using the azure-ai-inference clients will be automatically traced and sent to LangWatch, providing visibility into the performance and behavior of your AI model interactions.