Inspect AI
The open-source LLM evaluation toolkit Inspect AI enables standardized assessment of LLM behavior across a wide range of capabilities. It is a widely used tool for evaluating LLMs on some of the most popular benchmarks available, providing a modular and efficient framework.
The avidtools.connectors.inspectai
module generates AVID reports from evaluation runs performed using Inspect AI. To do so, you need to provide the path of the saved evaluation log.
Here is a minimal example of exporting InspectAI evaluation logs as AVID reports, based on an evaluation run using OpenAI's gpt-4o-mini
on the bold
benchmark.
Exporting to AVID reports
To begin, ensure that you have avidtools
installed in your environment:
pip install avidtools
Next, run the evaluation for your model using the inspect_evals
setup. Use a command like the one below, specifying the directory where the evaluation logs will be saved. In this example, logs will be stored in the ./experiment-log
directory. :
inspect eval inspect_evals/bold --model openai/gpt-4o-mini --log-dir ./experiment-log
Once you have run the Inspect AI evaluation, you can export the results as an AVID report. Use the code snippet below for the export:
from avidtools.connectors.inspect import convert_eval_log
# Export the logs as an AVID report
avid_report = convert_eval_log("experiment-log/bold_logs.eval")
Last updated