When exploring, it is common to generate multiple outputs from your LLM and then evaluate their performance scores, for example, using a Jupyter Notebook. LangEvals provides the evaluate() function to score these results in batch using diverse evaluators. This section will guide you through batch evaluation with multiple evaluators and demonstrate how to conveniently access the results.

Importing the Library

First, import langevals along with the evaluators that you will use.

import langevals
from langevals_ragas.answer_relevancy import RagasAnswerRelevancyEvaluator
from langevals_langevals.competitor_blocklist import (
    CompetitorBlocklistEvaluator,
    CompetitorBlocklistSettings,
)

Creating Evaluation Dataset

Next, create a pandas DataFrame with input and output columns. Each column represents the input and output of the LLM. It is important to name these columns exactly as shown to ensure compatibility with the evaluators. Some evaluators may require additional fields such as contexts and expected_output.

import pandas as pd

entries = pd.DataFrame(
    {
        "input": ["hello", "how are you?", "what is your name?"],
        "output": ["hi", "I am a chatbot, no feelings", "My name is Bob"],
    }
)

Run Evaluations

With a single call to the evaluate method, you can evaluate all data entries using the specified evaluators. Note that certain evaluators require a settings parameter. In this case, it is used to define the competitor’s name to be blocklisted. For further documentation, refer to Evaluators.

results = langevals.evaluate(
    entries,
    [
        RagasAnswerRelevancyEvaluator(),
        CompetitorBlocklistEvaluator(
            settings=CompetitorBlocklistSettings(competitors=["Bob"])
        ),
    ],
)

Access the Results

Finally, the results can be accessed as a pandas dataframe.

results.to_pandas()

Results:

inputoutputanswer_relevancycompetitor_blocklistcompetitor_blocklist_details
hellohi0.800714TrueNone
how are you?I am a chatbot, no feelings0.813168TrueNone
what is your name?My name is Bob0.971663FalseCompetitors mentioned: Bob