Inspect AI

The open-source LLM evaluation toolkit Inspect AI enables standardized assessment of LLM behavior across a wide range of capabilities. It is a widely used tool for evaluating LLMs on some of the most popular benchmarks available, providing a modular and efficient framework.

The avidtools.connectors.inspectai module generates AVID reports from evaluation runs performed using Inspect AI. To do so, you need to provide the path of the saved evaluation log.

Here is a minimal example of exporting InspectAI evaluation logs as AVID reports, based on an evaluation run using OpenAI's gpt-4o-mini on the bold benchmark.

Exporting to AVID reports

To begin, ensure that you have avidtools installed in your environment:

pip install avidtools

Next, run the evaluation for your model using the inspect_evals setup. Use a command like the one below, specifying the directory where the evaluation logs will be saved. In this example, logs will be stored in the ./experiment-log directory. :

inspect eval inspect_evals/bold --model openai/gpt-4o-mini --log-dir ./experiment-log

Once you have run the Inspect AI evaluation, you can export the results as an AVID report. Use the code snippet below for the export:

from avidtools.connectors.inspect import convert_eval_log

# Export the logs as an AVID report
avid_report = convert_eval_log("experiment-log/bold_logs.eval")

PreviousGiskard

Last updated 4 months ago