Inspect AI
Last updated
Last updated
The open-source LLM evaluation toolkit enables standardized assessment of LLM behavior across a wide range of capabilities. It is a widely used tool for evaluating LLMs on some of the most popular benchmarks available, providing a modular and efficient framework.
The avidtools.connectors.inspectai
module generates AVID reports from evaluation runs performed using Inspect AI. To do so, you need to provide the path of the saved evaluation log.
Here is a minimal example of exporting InspectAI evaluation logs as AVID reports, based on an evaluation run using OpenAI's gpt-4o-mini
on the bold
benchmark.
To begin, ensure that you have avidtools
installed in your environment:
Next, run the evaluation for your model using the setup. Use a command like the one below, specifying the directory where the evaluation logs will be saved. In this example, logs will be stored in the ./experiment-log
directory. :
Once you have run the Inspect AI evaluation, you can export the results as an AVID report. Use the code snippet below for the export: