Datamodels

In avidtools, AVID schema objects are represented as Pydantic datamodels. The top-level models are Report and Vulnerability.

Let's initialize an empty report:

from avidtools.datamodels.report import Report

report = Report()
report

This gives the following output:

Report(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, metrics=None, references=None, description=None, impact=None, credit=None, reported_date=None)

Let's now populate a couple of fields, replicating the data for the AVID report AVID-2022-R0003arrow-up-right.

from avidtools.datamodels.components import Affects, Artifact, LangValue, Problemtype
from avidtools.datamodels.enums import ArtifactTypeEnum, ClassEnum, TypeEnum

report.affects = Affects(
    developer = [],
    deployer = ['Hugging Face'],
    artifacts = [
        Artifact(type = ArtifactTypeEnum.model, name = 'bert-base-uncased'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze1'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze2')
    ]
)    

report.problemtype = Problemtype(
    classof = ClassEnum.llm,
    type = TypeEnum.detection,
    description = LangValue(
        lang = 'eng',
        value = 'Profession bias reinforcing gender stereotypes found in bert-base-uncased, as measured on the Winobias dataset'
    )
)

Auxiliary Classes

In the above example, apart from the base Report class we also use enumerators (ArtifactTypeEnum, ClassEnum, TypeEnum) and components (Affects, Problemtype, LangValue, Artifact).

An enumerator, or Enum, is used to standardize allowed values. For example, TypeEnum is defined as:

Enums live under avidtools.datamodels.enums. The currently supported report/vulnerability classes are:

  • AIID Incident

  • ATLAS Case Study

  • CVE Entry

  • LLM Evaluation

  • Third-party Report

  • Undefined

The rest are components, which correspond to Pydantic versions of auxiliary schema fields.

Auxiliary data class
Pydantic data type

description

LangValue

problemtype

Problemtype

affects

Affects

metrics

List[Metric]

references

List[Reference]

impact

Impact

credit

List[LangValue]

These live under avidtools.datamodels.components.

The Impact component can include AVID and ATLAS taxonomy mappings, as well as CVSS/CWE/0DIN structures:

  • AvidTaxonomy

  • AtlasTaxonomy

  • CVSSScores

  • CWETaxonomy

  • OdinTaxonomy

Write and Read

Realistically, fields in a report are programmatically populated at the end of evaluation pipelines, then handed off to downstream processes. You can save and load a report as JSON as follows.

For an example of how the Report datamodel can be used in tandem with AI evaluation pipelines, check out this spacearrow-up-right on Hugging Face.

Ingest

The Vulnerability datamodel works similarly to Report. It also supports ingesting information from an existing Report. This is useful to:

  1. form a new vulnerability from a report, or

  2. incorporate report-derived fields into an existing vulnerability.

Here's a small example.

After ingest, the vulnerability fields are updated from the input report and published_date / last_modified_date are set to today.

Last updated