Toxicity Detection Operations
Toxicity | Detection
The Toxicity | Detection
operation is useful to classify and score any harmful content by the user or the LLM.
![Tools Use AI Service](/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftoxicity-detection.0581175e.png&w=3840&q=75)
Input Configuration
Module Configuration
This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.
General Operation Fields
- input: The input to be checked on harmful content.
XML Configuration
Below is the XML configuration for this operation:
<ms-aichain:toxicity-detection
doc:name="Toxicity detection"
doc:id="c7e47148-ef40-4ec4-9b54-65f5c0a03b15"
config-ref="MISTRAL_AI"
input="#[payload.prompt]"
/>
Output Configuration
Response Payload
This operation responds with a json
payload containing the toxicity detection and rating.
Example Response Payload (Mistral)
This is an example from the Mistral Moderation Model.
{
"response": {
"model": "mistral-moderation-latest",
"id": "d664b37f6cff48b7a762aba1a2dfb12d",
"results": [
{
"category_scores": {
"pii": 0.00017952919006347656,
"violence_and_threats": 0.0006070137023925781,
"law": 0.00000476837158203125,
"dangerous_and_criminal_content": 0.0002613067626953125,
"selfharm": 0.00006604194641113281,
"financial": 0.0000311732292175293,
"hate_and_discrimination": 0.7958984375,
"health": 0.00003319978713989258,
"sexual": 0.00015842914581298828
},
"categories": {
"pii": false,
"violence_and_threats": false,
"law": false,
"dangerous_and_criminal_content": false,
"selfharm": false,
"financial": false,
"hate_and_discrimination": true,
"health": false,
"sexual": false
}
}
]
}
}
Example Response Payload (OpenAI)
This is an example from the OpenAI Moderation Model.
{
"response": {
"model": "omni-moderation-latest",
"id": "modr-7aaa8518ac1c3886d2dcb6c43e833f12",
"results": [
{
"category_scores": {
"illicit/violent": 0.0000022474089854291757,
"self-harm/instructions": 0.0000027535691114583474,
"harassment": 0.5699403734868076,
"violence/graphic": 0.000006302758756443267,
"illicit": 0.000016865275398914433,
"self-harm/intent": 0.000002840974294610978,
"hate/threatening": 0.0000010783312222985275,
"sexual/minors": 0.000002355261854303796,
"harassment/threatening": 0.0008374506033373245,
"hate": 0.007200823403173291,
"self-harm": 0.000007662101864956481,
"sexual": 0.00012448433020883747,
"violence": 0.0005300709180442588
},
"flagged": true,
"category_applied_input_types": {
"illicit/violent": [
"text"
],
"self-harm/instructions": [
"text"
],
"harassment": [
"text"
],
"violence/graphic": [
"text"
],
"illicit": [
"text"
],
"self-harm/intent": [
"text"
],
"hate/threatening": [
"text"
],
"sexual/minors": [
"text"
],
"harassment/threatening": [
"text"
],
"hate": [
"text"
],
"self-harm": [
"text"
],
"sexual": [
"text"
],
"violence": [
"text"
]
},
"categories": {
"illicit/violent": false,
"self-harm/instructions": false,
"harassment": true,
"violence/graphic": false,
"illicit": false,
"self-harm/intent": false,
"hate/threatening": false,
"sexual/minors": false,
"harassment/threatening": false,
"hate": false,
"self-harm": false,
"sexual": false,
"violence": false
}
}
]
}
}
Attributes
Along with the JSON payload, the operation also returns attributes, which include information about token and tool usage:
{
"attributes": {
"tokenUsage": null,
"additionalAttributes": {}
}
}
Example Use Cases
This operation can be particularly useful in various scenarios, such as:
- Detecting Toxic Inputs: Detect and block toxic input by the user to be send to the LLM.
- Detecting Harmful Responses: Filter out toxic LLM response that could be harmful to the user.