OSS with Trust & Safety

The newsletter for integrity engineering.

Sep 11, 2023

AVID (AI Vulnerability Database) collects, presents, and persists information about weaknesses and flaws in AI systems — “vulnerabilities.” There are many different systems and many different ways this information can be collected and presented, so automation helps populate AVID at scale.

Apollo has announced a partnership with AVID to facilitate open-collaboration with the above, our open-source project ModsysML has started building integrations with existing AVID resources. As our first integration for reporting, we’ve linked ModsysML, the open-source toolkit to improve AI models developed by the Apollo Team, to AVID. We are building a workflow for quickly converting the vulnerabilities Modsys finds into informative, evidence-based reports for model accuracy.

What is ModsysML?

ModsysML is an open source toolbox for automating, building insights and evaluating model responses across relevant testcases. Prompt tuning is a way to improve AI models with a small-medium context window of data. Instead of foundational training, prompt tuning allows users to take pre-trained models and tweak them to a specific use-case. Instead of stopping here we've incorporated the ability to automate evaluations and even suggestions for model responses while giving users the ability to connect to any database/api resource for curating analytics alongside running evaluations.

ModsysML provides a framework for building relevant test cases across 4 domains; semantic, LLM based, Human in the loop and programmatic tests. It's a collection of python plugins wrapped into an SDK that provide the basic functionality, but may not fit each use case out of the box. You can craft custom test cases, run evaluations on model responses, curate automated scripts and generate data frames.

ModsysML strictly uses official provider integrations via API, psycopg and a few other modules to provide the building blocks. ModsysML is the upstream project to the Apollo interface which serves as a UI for your development needs.

How does it work?

First install the modsys package from ModsysML:

pip install modsys

Next, import the module:

from modsys.client import Modsys

Finally create the report:

def evaluate_and_report():
    sdk = Modsys()
    sdk.use("google_perspective:analyze", google_perspective_api_key="#API-KEY")
    eval = sdk.evaluate(
        [
            {
                "item": "This is hate speech",
                "__expected": {"TOXICITY": {"value": "0.78"}},
                "__trend": "lower",
            },
            {
                "item": "You suck at this game.",
                "__expected": {"TOXICITY": {"value": "0.50"}},
                "__trend": "higher",
            },
        ],
        "community_id_can_be_None",
    )

    # export report based on output accuracy
    return sdk.create_report(
        "provider_name",
        "ai-model",
        "dataset_name",
        "link_to_data_set",
        "summary",
        "./path_to_json_file.json",
    )

We’re building this!

Check out Apollo, an open-source toolkit for integrity engineering that implements the process above and provides a way to extend model management with automation & data integrations.

Highlights & Events

Insights | Surprising cases with LLM from the ActiveFence team

Event | Trust & Safety Research Conference

Event | Marketplace Risk Global Summit

Coffee Hours | TSPA Coffee Hours With Me

Candid

OSS with Trust & Safety

The newsletter for integrity engineering.

What is ModsysML?

How does it work?

We’re building this!

Highlights & Events

Join The Community!

A community dedicated to delivering trust as a service and building a better internet.

The Brief is a platform and community on a mission to deliver trust as a service for products & teams that lead to a safer internet.

`A community dedicated to delivering trust as a service and building a better internet.`