# Datamodels

In `avidtools`, AVID schema objects are represented as Pydantic datamodels. The top-level models are `Report` and `Vulnerability`.

Let's initialize an empty report:

```python
from avidtools.datamodels.report import Report

report = Report()
report
```

This gives the following output:

```bash
Report(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, metrics=None, references=None, description=None, impact=None, credit=None, reported_date=None)
```

Let's now populate a couple of fields, replicating the data for the AVID report [AVID-2022-R0003](https://github.com/avidml/avid-db/blob/main/reports/2022/AVID-2022-R0003.json).

```python
from avidtools.datamodels.components import Affects, Artifact, LangValue, Problemtype
from avidtools.datamodels.enums import ArtifactTypeEnum, ClassEnum, TypeEnum

report.affects = Affects(
    developer = [],
    deployer = ['Hugging Face'],
    artifacts = [
        Artifact(type = ArtifactTypeEnum.model, name = 'bert-base-uncased'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze1'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze2')
    ]
)    

report.problemtype = Problemtype(
    classof = ClassEnum.llm,
    type = TypeEnum.detection,
    description = LangValue(
        lang = 'eng',
        value = 'Profession bias reinforcing gender stereotypes found in bert-base-uncased, as measured on the Winobias dataset'
    )
)
```

### Auxiliary Classes

In the above example, apart from the base `Report` class we also use enumerators (`ArtifactTypeEnum`, `ClassEnum`, `TypeEnum`) and components (`Affects`, `Problemtype`, `LangValue`, `Artifact`).

An **enumerator**, or `Enum`, is used to standardize allowed values. For example, `TypeEnum` is defined as:

```python
class TypeEnum(str, Enum):
    """All report/vulnerability types."""
    issue = 'Issue'
    advisory = 'Advisory'
    measurement = 'Measurement'
    detection = 'Detection'
```

Enums live under `avidtools.datamodels.enums`. The currently supported report/vulnerability classes are:

* `AIID Incident`
* `ATLAS Case Study`
* `CVE Entry`
* `LLM Evaluation`
* `Third-party Report`
* `Undefined`

The rest are **components**, which correspond to Pydantic versions of auxiliary schema fields.

| Auxiliary data class | Pydantic data type |
| -------------------- | ------------------ |
| `description`        | `LangValue`        |
| `problemtype`        | `Problemtype`      |
| `affects`            | `Affects`          |
| `metrics`            | `List[Metric]`     |
| `references`         | `List[Reference]`  |
| `impact`             | `Impact`           |
| `credit`             | `List[LangValue]`  |

These live under `avidtools.datamodels.components`.

The `Impact` component can include AVID and ATLAS taxonomy mappings, as well as CVSS/CWE/0DIN structures:

* `AvidTaxonomy`
* `AtlasTaxonomy`
* `CVSSScores`
* `CWETaxonomy`
* `OdinTaxonomy`

### Write and Read

Realistically, fields in a report are programmatically populated at the end of evaluation pipelines, then handed off to downstream processes. You can save and load a report as JSON as follows.

```python
import json

# Saves a report
report.save('test.json')

# Loads a report
with open('test.json', 'r') as f:
    datax = json.load(f)
report2 = Report(**datax)
```

For an example of how the `Report` datamodel can be used in tandem with AI evaluation pipelines, check out [this space](https://huggingface.co/spaces/avid-ml/bias-detection) on Hugging Face.

### Ingest

The `Vulnerability` datamodel works similarly to `Report`. It also supports ingesting information from an existing `Report`. This is useful to:

1. form a new vulnerability from a report, or
2. incorporate report-derived fields into an existing vulnerability.

Here's a small example.

```python
from avidtools.datamodels.vulnerability import Vulnerability

vuln = Vulnerability()
vuln.ingest(report)
vuln
# Vulnerability(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, references=None, description=None, reports=None, impact=None, credit=None, published_date=datetime.date(2023, 6, 14), last_modified_date=datetime.date(2023, 6, 14))
```

After `ingest`, the vulnerability fields are updated from the input report and `published_date` / `last_modified_date` are set to today.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.avidml.org/developer-tools/python-sdk/datamodels.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
