# Datamodels

In `avidtools`, AVID schema objects are represented as Pydantic datamodels. The top-level models are `Report` and `Vulnerability`.

Let's initialize an empty report:

```python
from avidtools.datamodels.report import Report

report = Report()
report
```

This gives the following output:

```bash
Report(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, metrics=None, references=None, description=None, impact=None, credit=None, reported_date=None)
```

Let's now populate a couple of fields, replicating the data for the AVID report [AVID-2022-R0003](https://github.com/avidml/avid-db/blob/main/reports/2022/AVID-2022-R0003.json).

```python
from avidtools.datamodels.components import Affects, Artifact, LangValue, Problemtype
from avidtools.datamodels.enums import ArtifactTypeEnum, ClassEnum, TypeEnum

report.affects = Affects(
    developer = [],
    deployer = ['Hugging Face'],
    artifacts = [
        Artifact(type = ArtifactTypeEnum.model, name = 'bert-base-uncased'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze1'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze2')
    ]
)    

report.problemtype = Problemtype(
    classof = ClassEnum.llm,
    type = TypeEnum.detection,
    description = LangValue(
        lang = 'eng',
        value = 'Profession bias reinforcing gender stereotypes found in bert-base-uncased, as measured on the Winobias dataset'
    )
)
```

### Auxiliary Classes

In the above example, apart from the base `Report` class we also use enumerators (`ArtifactTypeEnum`, `ClassEnum`, `TypeEnum`) and components (`Affects`, `Problemtype`, `LangValue`, `Artifact`).

An **enumerator**, or `Enum`, is used to standardize allowed values. For example, `TypeEnum` is defined as:

```python
class TypeEnum(str, Enum):
    """All report/vulnerability types."""
    issue = 'Issue'
    advisory = 'Advisory'
    measurement = 'Measurement'
    detection = 'Detection'
```

Enums live under `avidtools.datamodels.enums`. The currently supported report/vulnerability classes are:

* `AIID Incident`
* `ATLAS Case Study`
* `CVE Entry`
* `LLM Evaluation`
* `Third-party Report`
* `Undefined`

The rest are **components**, which correspond to Pydantic versions of auxiliary schema fields.

| Auxiliary data class | Pydantic data type |
| -------------------- | ------------------ |
| `description`        | `LangValue`        |
| `problemtype`        | `Problemtype`      |
| `affects`            | `Affects`          |
| `metrics`            | `List[Metric]`     |
| `references`         | `List[Reference]`  |
| `impact`             | `Impact`           |
| `credit`             | `List[LangValue]`  |

These live under `avidtools.datamodels.components`.

The `Impact` component can include AVID and ATLAS taxonomy mappings, as well as CVSS/CWE/0DIN structures:

* `AvidTaxonomy`
* `AtlasTaxonomy`
* `CVSSScores`
* `CWETaxonomy`
* `OdinTaxonomy`

### Write and Read

Realistically, fields in a report are programmatically populated at the end of evaluation pipelines, then handed off to downstream processes. You can save and load a report as JSON as follows.

```python
import json

# Saves a report
report.save('test.json')

# Loads a report
with open('test.json', 'r') as f:
    datax = json.load(f)
report2 = Report(**datax)
```

For an example of how the `Report` datamodel can be used in tandem with AI evaluation pipelines, check out [this space](https://huggingface.co/spaces/avid-ml/bias-detection) on Hugging Face.

### Ingest

The `Vulnerability` datamodel works similarly to `Report`. It also supports ingesting information from an existing `Report`. This is useful to:

1. form a new vulnerability from a report, or
2. incorporate report-derived fields into an existing vulnerability.

Here's a small example.

```python
from avidtools.datamodels.vulnerability import Vulnerability

vuln = Vulnerability()
vuln.ingest(report)
vuln
# Vulnerability(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, references=None, description=None, reports=None, impact=None, credit=None, published_date=datetime.date(2023, 6, 14), last_modified_date=datetime.date(2023, 6, 14))
```

After `ingest`, the vulnerability fields are updated from the input report and `published_date` / `last_modified_date` are set to today.
