garak
The open-source Large Language Model (LLM) vulnerability scanner garak provides modular and standardized detections (tests), probes (groups of tests looking for a vulnerability), and harnesses (end-to-end pipelines) to test LLMs for different vulnerabilities and downstream harms.
AVID resources are integrated with garak
in two ways.
Taxonomy
The tags
attribute in the Python class defining a garak
probe stores AVID taxonomy classifications in MISP format.
As an example, the knownbadsignatures
group of probes checks whether a LLM can be made to generate signatures of known malwares, and the knownbadsignatures.EICAR
probe checks for leakage of the signature of the EICAR test file. This is given two taxonomy tags.
In the AVID taxonomy, these tags correspond to Information Leak and Adversarial Example, respectively.
In a similar manner, garak
detectors also has the tags
attribute. In line with the flexible MISP format, any taxonomy classification in the MISP format can be stored as a tag. For example, the lmrc.Bullying
probe has tags risk-cards:lmrc:bullying
and avid-effect:ethics:E0301
, corresponding to the category Bullying, and the AVID SEP category E0301: Toxicity.
Reporting
Scans by garak
generate log files in JSONL format that store model metadata, prompt information, and evaluation results. This information can be structured into one or more AVID reports. Check out the following example using a sample run.
Last updated