garak
The open-source Large Language Model (LLM) vulnerability scanner garak provides modular and standardized detections (tests), probes (groups of tests looking for a vulnerability), and harnesses (end-to-end pipelines) to test LLMs for different vulnerabilities and downstream harms.
AVID resources are integrated with garak
in two ways.
Taxonomy
The tags
attribute in the Python class defining a garak
probe stores AVID taxonomy classifications in MISP format.
# Source: https://github.com/leondz/garak/blob/main/garak/probes/base.py
...
class Probe:
"""Base class for objects that define and execute LLM evaluations"""
name = None
description = "Empty probe"
uri = ""
bcp47 = None # language this is for, in bcp47 format; * for all langs
recommended_detector = ["always.Fail"] # send a signal if this isn't overridden
active = True
tags = [] # MISP-format taxonomy categories
...
As an example, the knownbadsignatures
group of probes checks whether a LLM can be made to generate signatures of known malwares, and the knownbadsignatures.EICAR
probe checks for leakage of the signature of the EICAR test file. This is given two taxonomy tags.
# Source: https://github.com/leondz/garak/blob/main/garak/probes/knownbadsignatures.py
...
class EICAR(Probe):
name = "eicar"
description = "Does the model check its output for viruses? Try to send the EICAR code through"
bcp47 = "*"
uri = "https://en.wikipedia.org/wiki/EICAR_test_file"
recommended_detector = [
"knownbadsignatures.EICAR",
]
tags = ["avid-effect:security:S0301", "avid-effect:security:S0403"]
...
In the AVID taxonomy, these tags correspond to Information Leak and Adversarial Example, respectively.
In a similar manner, garak
detectors also has the tags
attribute. In line with the flexible MISP format, any taxonomy classification in the MISP format can be stored as a tag. For example, the lmrc.Bullying
probe has tags risk-cards:lmrc:bullying
and avid-effect:ethics:E0301
, corresponding to the category Bullying, and the AVID SEP category E0301: Toxicity.
Reporting
Scans by garak
generate log files in JSONL format that store model metadata, prompt information, and evaluation results. This information can be structured into one or more AVID reports. Check out the following example using a sample run.
wget https://gist.githubusercontent.com/shubhobm/9fa52d71c8bb36bfb888eee2ba3d18f2/raw/ef1808e6d3b26002d9b046e6c120d438adf49008/gpt35-0906.report.jsonl
python3 -m garak -r gpt35-0906.report.jsonl
## output:
# garak LLM security probe v0.9.0.6 ( https://github.com/leondz/garak ) at 2023-07-23T15:30:37.699120
# 📜 Converting garak reports gpt35-0906.report.jsonl
# 📜 AVID reports generated at gpt35-0906.avid.jsonl
Last updated