Instructor AI is a library that provides structured output capabilities for LLMs, making it easier to extract structured data from language models. For more details on Instructor AI, refer to the official Instructor documentation. LangWatch can capture traces generated by Instructor AI by leveraging OpenInference’s OpenAI instrumentation, since Instructor AI is built on top of OpenAI’s client. This guide will show you how to set it up.

Prerequisites

  1. Install LangWatch SDK:
    pip install langwatch
    
  2. Install Instructor AI and OpenInference instrumentor:
    pip install instructor openinference-instrumentation-instructor
    
  3. Set up your OpenAI API key: You’ll need to configure your OpenAI API key in your environment.

Instrumentation with OpenInference

LangWatch supports seamless observability for Instructor AI using the OpenInference Instructor AI instrumentor. This dedicated instrumentor automatically captures traces from your Instructor AI calls and sends them to LangWatch.

Basic Setup (Automatic Tracing)

Here’s the simplest way to instrument your application:
import langwatch
import instructor
from openinference.instrumentation.instructor import InstructorInstrumentor
from openai import OpenAI
import os
from pydantic import BaseModel
from typing import List

# Initialize LangWatch with the Instructor AI instrumentor
langwatch.setup(
    instrumentors=[InstructorInstrumentor()]
)

# Set up environment variables
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

# Create an OpenAI client
client = OpenAI()

# Patch the client with Instructor
client = instructor.patch(client)

# Define your Pydantic models for structured output
class User(BaseModel):
    name: str
    age: int
    email: str

class UserList(BaseModel):
    users: List[User]

# Use the client as usual—traces will be sent to LangWatch automatically
def extract_user_info(text: str) -> User:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=User,
        messages=[
            {"role": "user", "content": f"Extract user information from: {text}"}
        ]
    )

def extract_multiple_users(text: str) -> UserList:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=UserList,
        messages=[
            {"role": "user", "content": f"Extract all users from: {text}"}
        ]
    )

# Example usage
if __name__ == "__main__":
    text = "John is 25 years old and his email is [email protected]"
    user = extract_user_info(text)
    print(f"Extracted user: {user}")
    
    multiple_text = "Alice is 30 ([email protected]) and Bob is 28 ([email protected])"
    users = extract_multiple_users(multiple_text)
    print(f"Extracted users: {users}")
That’s it! All Instructor AI calls will now be traced and sent to your LangWatch dashboard automatically.

Optional: Using Decorators for Additional Context

If you want to add additional context or metadata to your traces, you can optionally use the @langwatch.trace() decorator:
import langwatch
import instructor
from openinference.instrumentation.instructor import InstructorInstrumentor
from openai import OpenAI
import os
from pydantic import BaseModel

langwatch.setup(
    instrumentors=[InstructorInstrumentor()]
)

client = OpenAI()
client = instructor.patch(client)

class Product(BaseModel):
    name: str
    price: float
    category: str

@langwatch.trace(name="Product Information Extraction")
def extract_product_info(text: str) -> Product:
    # Update the current trace with additional metadata
    current_trace = langwatch.get_current_trace()
    if current_trace:
        current_trace.update(
            metadata={
                "extraction_type": "product_info",
                "model": "gpt-4o-mini",
                "source_text_length": len(text)
            }
        )
    
    return client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=Product,
        messages=[
            {"role": "user", "content": f"Extract product information from: {text}"}
        ]
    )

How it Works

  1. langwatch.setup(): Initializes the LangWatch SDK, which includes setting up an OpenTelemetry trace exporter. This exporter is ready to receive spans from any OpenTelemetry-instrumented library in your application.
  2. InstructorInstrumentor(): The OpenInference instrumentor automatically patches Instructor AI operations to create OpenTelemetry spans for their operations, including:
    • Structured output generation
    • Model calls with response models
    • Validation and parsing
    • Error handling
  3. Instructor AI Integration: The dedicated Instructor AI instrumentor captures all Instructor AI operations (structured output generation, validation, etc.) as spans.
  4. Optional Decorators: You can optionally use @langwatch.trace() to add additional context and metadata to your traces, but it’s not required for basic functionality.
With this setup, all Instructor AI operations, including structured output generation, validation, and error handling, will be automatically traced and sent to LangWatch, providing comprehensive visibility into your structured data extraction applications.

Notes

  • You do not need to set any OpenTelemetry environment variables or configure exporters manually—langwatch.setup() handles everything.
  • You can combine Instructor AI instrumentation with other instrumentors (e.g., LangChain, DSPy) by adding them to the instrumentors list.
  • The @langwatch.trace() decorator is optional - the OpenInference instrumentor will capture all Instructor AI activity automatically.
  • For advanced configuration (custom attributes, endpoint, etc.), see the Python integration guide.

Troubleshooting

  • Make sure your LANGWATCH_API_KEY is set in the environment.
  • If you see no traces in LangWatch, check that the instrumentor is included in langwatch.setup() and that your Instructor AI code is being executed.
  • Ensure you have the correct OpenAI API key set.
  • Verify that your Pydantic models are properly defined and compatible with Instructor AI.

Interoperability with LangWatch SDK

You can use this integration together with the LangWatch Python SDK to add additional attributes to the trace:
import langwatch
import instructor
from openinference.instrumentation.instructor import InstructorInstrumentor
from openai import OpenAI
import os
from pydantic import BaseModel

langwatch.setup(
    instrumentors=[InstructorInstrumentor()]
)

client = OpenAI()
client = instructor.patch(client)

class Task(BaseModel):
    title: str
    priority: str
    due_date: str

@langwatch.trace(name="Task Extraction Pipeline")
def extract_tasks_from_text(text: str) -> List[Task]:
    # Update the current trace with additional metadata
    current_trace = langwatch.get_current_trace()
    if current_trace:
        current_trace.update(
            metadata={
                "pipeline_type": "task_extraction",
                "model": "gpt-4o-mini",
                "input_length": len(text)
            }
        )
    
    # Your Instructor AI code here
    return client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=List[Task],
        messages=[
            {"role": "user", "content": f"Extract tasks from: {text}"}
        ]
    )
This approach allows you to combine the automatic tracing capabilities of Instructor AI with the rich metadata and custom attributes provided by LangWatch.