Evaluate on CI/CD Pipeline
A good AI applicaiton worth a good software. And a good software is built with a reliable CI/CD pipeline. Here we will explain how to test your LLM outputs on the level of the deployment pipeline.
Prepare the Test Suite
The first step is to create the test suite. You can add as many tests as you want and import required parts of your application. In this example application we will create a simple test case with handcrafted examples. However, you can load a prepared dataset and test your application on it.
Run with GitHub Actions
Second step is to create a GitHub Actions Workflow - specifying the details of your test. You can do that by creating .github/workflows/
folder and adding a .yaml
file there. Here is an example script that will automatically run on every new push to the repo:
Pay attention - you need to import all of the libraries that are used in your test case and specify the path to your test file.
After writing the script, you need to go to your GitHub repository and navigate to Settings > Secrets and variables (left menu) > Actions
and press New repository secret
button. If you want to use the evaluators employing LLM-as-a-Judge approach - you need to specify the API key of your LLM provider as a repository secret.
Now you can automatically evaluate if any changes to your prompt or change of LLM provider are degrading your application.