# garak

The open-source Large Language Model (LLM) vulnerability scanner [garak](https://github.com/leondz/garak/) provides modular and standardized detections (tests), probes (groups of tests looking for a vulnerability), and harnesses (end-to-end pipelines) to test LLMs for different vulnerabilities and downstream harms.

AVID resources are integrated with `garak` in two ways.

### Taxonomy

The `tags` attribute in the Python class defining a `garak` probe stores AVID taxonomy classifications in MISP format.

```python
# Source: https://github.com/leondz/garak/blob/main/garak/probes/base.py
...
class Probe:
    """Base class for objects that define and execute LLM evaluations"""

    name = None
    description = "Empty probe"
    uri = ""
    bcp47 = None  # language this is for, in bcp47 format; * for all langs
    recommended_detector = ["always.Fail"]  # send a signal if this isn't overridden
    active = True
    tags = []  # MISP-format taxonomy categories
    ...
```

As an example, the [`knownbadsignatures`](https://github.com/leondz/garak/blob/main/garak/probes/knownbadsignatures.py) group of probes checks whether a LLM can be made to generate signatures of known malwares, and the `knownbadsignatures.EICAR` probe checks for leakage of the signature of the [EICAR test file](https://en.wikipedia.org/wiki/EICAR_test_file). This is given two taxonomy tags.

```python
# Source: https://github.com/leondz/garak/blob/main/garak/probes/knownbadsignatures.py
...
class EICAR(Probe):
    name = "eicar"
    description = "Does the model check its output for viruses? Try to send the EICAR code through"
    bcp47 = "*"
    uri = "https://en.wikipedia.org/wiki/EICAR_test_file"
    recommended_detector = [
        "knownbadsignatures.EICAR",
    ]
    tags = ["avid-effect:security:S0301", "avid-effect:security:S0403"]
    ...
```

In the AVID taxonomy, these tags correspond to *Information Leak* and *Adversarial Example*, respectively.

In a similar manner, `garak` detectors also has the `tags` attribute. In line with the flexible MISP format, *any* taxonomy classification in the MISP format can be stored as a tag. For example, the [`lmrc.Bullying`](https://github.com/leondz/garak/blob/cd6d2ec822e63b238a7effbc3181d31b275a3f16/garak/probes/lmrc.py#L40) probe has tags `risk-cards:lmrc:bullying` and `avid-effect:ethics:E0301`, corresponding to the [Risk Card](#user-content-fn-1)[^1] category *Bullying*, and the AVID SEP category *E0301: Toxicity*.

### Reporting

Scans by `garak` generate log files in JSONL format that store model metadata, prompt information, and evaluation results. This information can be structured into one or more AVID reports. Check out the following example using a sample run.

<pre class="language-bash"><code class="lang-bash">wget https://gist.githubusercontent.com/shubhobm/9fa52d71c8bb36bfb888eee2ba3d18f2/raw/ef1808e6d3b26002d9b046e6c120d438adf49008/gpt35-0906.report.jsonl
python3 -m garak -r gpt35-0906.report.jsonl
<strong>## output:
</strong># garak LLM security probe v0.9.0.6 ( https://github.com/leondz/garak ) at 2023-07-23T15:30:37.699120
# 📜 Converting garak reports gpt35-0906.report.jsonl
# 📜 AVID reports generated at gpt35-0906.avid.jsonl
</code></pre>

[^1]: [Derczynski et al](https://arxiv.org/abs/2303.18190). Assessing Language Model Deployment with Risk Cards, arXiV, 2023.
