Datamodels

In avidtools, we define the base and auxiliary data classes mentioned earlier, as well as a few other fields, as Pydantic datamodels. The top-level models are Report and Vulnerability. Each of them contains a number of fields, representing the auxiliary classes.

Let's initialize an empty report:

from avidtools.datamodels.report import Report
report = Report()
report

This gives the following output:

Report(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, metrics=None, references=None, description=None, impact=None, credit=None, reported_date=None)

Let's mow populate a couple of fields, replicating the data for the AVID report AVID-2022-R0003.

report.affects = Affects(
    developer = [],
    deployer = ['Hugging Face'],
    artifacts = [
        Artifact(type = ArtifactTypeEnum.model, name = 'bert-base-uncased'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze1'),
        Artifact(type = ArtifactTypeEnum.dataset, name = 'sasha/wino_bias_cloze2')
    ]
)    

report.problemtype = Problemtype(
    classof = ClassEnum.llm,
    type = TypeEnum.detection,
    description = LangValue(
        lang = 'eng',
        value = 'Profession bias reinforcing gender stereotypes found in bert-base-uncased, as measured on the Winobias dataset'
    )
)

Auxiliary Classes

In the above code example, apart from the base Report class we also use enumerators (ArtifactTypeEnum, ClassEnum, TypeEnum) and other components (Affects, Problemtype). These are there to supply details in the auxiliary data classes mentioned earlier.

An enumerator, orEnum, is used to standardize list inputs. For example, TypeEnum is defined as

class TypeEnum(str, Enum):
    """All report/vulnerability types."""
    issue = 'Issue'
    advisory = 'Advisory'
    measurement = 'Measurement'
    detection = 'Detection'

These live under avidtools.datamodels.enums.

The rest are components, which correspond to Pydantic versions of our auxiliary data classes. Here's a 1-1 mapping of them.

Auxiliary data class
Pydantic data type

description

LangValue

problemtype

Problemtype

affects

Affects

metrics

List[Metric]

references

List[Reference]

impact

Impact

credit

List[LangValue]

These live under avidtools.datamodels.components.

Write and Read

Realistically, fields in a report would be programmatically populated at the end of evaluation pipelines, then handed off to downstream processes. To this end, reports need to be saved and loaded. You can do so by calling save and json.load, respectively.

# Saves a report
report.save('test.json')

# Loads a report
with open('test.json', 'w') as f:
    datax = json.load(f)
report2 = Report(**datax)

For an example of how the Report datamodel can be used in tandem with AI evaluation pipelines, check out this space on Hugging Face.

Ingest

The Vulnerability datamodel works similarly as Report. The only extra functionality it provides is that of ingesting the information of an existing Report. This is important to (a) form a new vulnerability out of a report, or (b) incorporate the report into an existing vulnerability.

Currently, avidtools only supports (a). Here's a small example.

from avidtools.datamodels.vulnerability import Vulnerability
vuln = Vulnerability()
vuln.ingest(report)
vuln
# Vulnerability(data_type='AVID', data_version=None, metadata=None, affects=None, problemtype=None, references=None, description=None, reports=None, impact=None, credit=None, published_date=datetime.date(2023, 6, 14), last_modified_date=datetime.date(2023, 6, 14))

Last updated