Agent Operations
Image | Read by (URL or Base64 Format)
The Image read by (url or base64)
operation reads and interprets an image based on a prompt.

Input Configuration
Module Configuration
This refers to the MuleSoft Inference Vision Model Configuration set up in the Getting Started section.
General Operation Fields
- Prompt: Contains the prompt for the operation.
- Image: Contains the URL or Base64 String of the image file that will be read.
XML Configuration
Below is the XML configuration for this operation:
<mac-inference:read-image
doc:id="dfbd1a61-6e98-4b5b-b77a-bfe031e70d45"
config-ref="VisionHuggingFace"
doc:name="Read image" >
<mac-inference:prompt >
<![CDATA[#[payload.prompt]]]>
</mac-inference:prompt>
<mac-inference:image-url >
<![CDATA[#[payload.imageUrl]]]>
</mac-inference:image-url>
</mac-inference:read-image>
Output Configuration
Response Payload
This operation responds with a json
payload containing the main LLM response. Additionally, attributes such as token usage are included as part of the metadata (attributes), but not within the main payload.
Example Response Payload
{
"payload": {
"response": "The image depicts the Eiffel Tower in Paris during a snowy day. The tower is partially covered in snow, and the surrounding trees and ground are also blanketed in snow. There is a pathway leading towards the Eiffel Tower, with a lamppost and some fencing along the sides. The overall scene has a serene and picturesque winter atmosphere."
}
}
Attributes
Along with the JSON payload, the operation also returns attributes, which include information about token usage:
{
"attributes": {
"tokenUsage": {
"outputCount": 68,
"totalCount": 335,
"inputCount": 267
},
"additionalAttributes": {
"finish_reason": "stop",
"model": "grok-vision-beta",
"id": "604ae573-8265-4dc0-b06e-457422f2fbd8"
}
}
}
Example Use Cases
This operation is particularly useful in scenarios where you need to interpret or describe an image, such as:
- Image Analysis: Analyzing images in business reports, presentations, or customer service scenarios.
- Content Generation: Describing images for blog posts, articles, or social media.
- Visual Insights: Extracting insights from images in research or design projects.