Evaluate on CI/CD Pipeline
A good AI applicaiton worth a good software. And a good software is built with a reliable CI/CD pipeline. Here we will explain how to test your LLM outputs on the level of the deployment pipeline.
Prepare the Test Suite
The first step is to create the test suite. You can add as many tests as you want and import required parts of your application. In this example application we will create a simple test case with handcrafted examples. However, you can load a prepared dataset and test your application on it.
import pandas as pd
import pytest
from langevals import expect
from langevals_langevals.off_topic import (
OffTopicSettings,
OffTopicEvaluator,
AllowedTopic,
)
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
entries = pd.DataFrame(
{
"input": [
"What is unique about your product?",
"Write a python code that prints fibonacci numbers.",
"What are the core features of your platform?",
],
}
)
@pytest.mark.parametrize("entry", entries.itertuples())
def test_extracts_the_right_address(entry):
settings = OffTopicSettings(
allowed_topics=[
AllowedTopic(topic="our_product", description="Questions about our company and product"),
AllowedTopic(topic="small_talk", description="Small talk questions"),
AllowedTopic(topic="book_a_call", description="Requests to book a call"),
],
model="openai/gpt-3.5-turbo-1106",
)
off_topic_evaluator = OffTopicEvaluator(settings=settings)
expect(input=entry.input).to_pass(off_topic_evaluator)
Run with GitHub Actions
Second step is to create a GitHub Actions Workflow - specifying the details of your test. You can do that by creating .github/workflows/
folder and adding a .yaml
file there. Here is an example script that will automatically run on every new push to the repo:
name: Run LangEvals
on:
push:
branches:
- main
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pandas pytest "langevals[all]"
- name: Run tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
pytest ./test_script.py
Pay attention - you need to import all of the libraries that are used in your test case and specify the path to your test file.
After writing the script, you need to go to your GitHub repository and navigate to Settings > Secrets and variables (left menu) > Actions
and press New repository secret
button. If you want to use the evaluators employing LLM-as-a-Judge approach - you need to specify the API key of your LLM provider as a repository secret.
Now you can automatically evaluate if any changes to your prompt or change of LLM provider are degrading your application.