Toxicity Detection Operations

Toxicity | Detection

The Toxicity | Detection operation is useful to classify and score any harmful content by the user or the LLM.

Input Configuration

Module Configuration

This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.

General Operation Fields

input: The input to be checked on harmful content.

XML Configuration

Below is the XML configuration for this operation:

<ms-aichain:toxicity-detection 
   doc:name="Toxicity detection" 
   doc:id="c7e47148-ef40-4ec4-9b54-65f5c0a03b15" 
   config-ref="MISTRAL_AI" 
   input="#[payload.prompt]"
/>

Output Configuration

Response Payload

This operation responds with a json payload containing the toxicity detection and rating.

Example Response Payload (Mistral)

This is an example from the Mistral Moderation Model.

{
    "response": {
        "model": "mistral-moderation-latest",
        "id": "d664b37f6cff48b7a762aba1a2dfb12d",
        "results": [
            {
                "category_scores": {
                    "pii": 0.00017952919006347656,
                    "violence_and_threats": 0.0006070137023925781,
                    "law": 0.00000476837158203125,
                    "dangerous_and_criminal_content": 0.0002613067626953125,
                    "selfharm": 0.00006604194641113281,
                    "financial": 0.0000311732292175293,
                    "hate_and_discrimination": 0.7958984375,
                    "health": 0.00003319978713989258,
                    "sexual": 0.00015842914581298828
                },
                "categories": {
                    "pii": false,
                    "violence_and_threats": false,
                    "law": false,
                    "dangerous_and_criminal_content": false,
                    "selfharm": false,
                    "financial": false,
                    "hate_and_discrimination": true,
                    "health": false,
                    "sexual": false
                }
            }
        ]
    }
}

Example Response Payload (OpenAI)

This is an example from the OpenAI Moderation Model.

{
    "response": {
        "model": "omni-moderation-latest",
        "id": "modr-7aaa8518ac1c3886d2dcb6c43e833f12",
        "results": [
            {
                "category_scores": {
                    "illicit/violent": 0.0000022474089854291757,
                    "self-harm/instructions": 0.0000027535691114583474,
                    "harassment": 0.5699403734868076,
                    "violence/graphic": 0.000006302758756443267,
                    "illicit": 0.000016865275398914433,
                    "self-harm/intent": 0.000002840974294610978,
                    "hate/threatening": 0.0000010783312222985275,
                    "sexual/minors": 0.000002355261854303796,
                    "harassment/threatening": 0.0008374506033373245,
                    "hate": 0.007200823403173291,
                    "self-harm": 0.000007662101864956481,
                    "sexual": 0.00012448433020883747,
                    "violence": 0.0005300709180442588
                },
                "flagged": true,
                "category_applied_input_types": {
                    "illicit/violent": [
                        "text"
                    ],
                    "self-harm/instructions": [
                        "text"
                    ],
                    "harassment": [
                        "text"
                    ],
                    "violence/graphic": [
                        "text"
                    ],
                    "illicit": [
                        "text"
                    ],
                    "self-harm/intent": [
                        "text"
                    ],
                    "hate/threatening": [
                        "text"
                    ],
                    "sexual/minors": [
                        "text"
                    ],
                    "harassment/threatening": [
                        "text"
                    ],
                    "hate": [
                        "text"
                    ],
                    "self-harm": [
                        "text"
                    ],
                    "sexual": [
                        "text"
                    ],
                    "violence": [
                        "text"
                    ]
                },
                "categories": {
                    "illicit/violent": false,
                    "self-harm/instructions": false,
                    "harassment": true,
                    "violence/graphic": false,
                    "illicit": false,
                    "self-harm/intent": false,
                    "hate/threatening": false,
                    "sexual/minors": false,
                    "harassment/threatening": false,
                    "hate": false,
                    "self-harm": false,
                    "sexual": false,
                    "violence": false
                }
            }
        ]
    }
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token and tool usage:

{
    "attributes": {
        "tokenUsage": null,
        "additionalAttributes": {}
    }
}

Example Use Cases

This operation can be particularly useful in various scenarios, such as:

Detecting Toxic Inputs: Detect and block toxic input by the user to be send to the LLM.
Detecting Harmful Responses: Filter out toxic LLM response that could be harmful to the user.

Tools Showcase