Agent Operations

Image | Read by (URL or Base64 Format)

The Image read by (url or base64) operation reads and interprets an image based on a prompt.

Input Configuration

Module Configuration

This refers to the MuleSoft Inference Vision Model Configuration set up in the Getting Started section.

General Operation Fields

Prompt: Contains the prompt for the operation.
Image: Contains the URL or Base64 String of the image file that will be read.

XML Configuration

Below is the XML configuration for this operation:

  <mac-inference:read-image 
  doc:id="dfbd1a61-6e98-4b5b-b77a-bfe031e70d45" 
  config-ref="VisionHuggingFace" 
  doc:name="Read image" >
			<mac-inference:prompt >
        <![CDATA[#[payload.prompt]]]>
      </mac-inference:prompt>
			<mac-inference:image-url >
        <![CDATA[#[payload.imageUrl]]]>
      </mac-inference:image-url>
		</mac-inference:read-image>

Output Configuration

Response Payload

This operation responds with a json payload containing the main LLM response. Additionally, attributes such as token usage are included as part of the metadata (attributes), but not within the main payload.

Example Response Payload

{
    "payload": {
        "response": "The image depicts the Eiffel Tower in Paris during a snowy day. The tower is partially covered in snow, and the surrounding trees and ground are also blanketed in snow. There is a pathway leading towards the Eiffel Tower, with a lamppost and some fencing along the sides. The overall scene has a serene and picturesque winter atmosphere."
    }
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token usage:

{
  "attributes": {
      "tokenUsage": {
          "outputCount": 68,
          "totalCount": 335,
          "inputCount": 267
      },
      "additionalAttributes": {
          "finish_reason": "stop",
          "model": "grok-vision-beta",
          "id": "604ae573-8265-4dc0-b06e-457422f2fbd8"
      }
  }
}

Example Use Cases

This operation is particularly useful in scenarios where you need to interpret or describe an image, such as:

Image Analysis: Analyzing images in business reports, presentations, or customer service scenarios.
Content Generation: Describing images for blog posts, articles, or social media.
Visual Insights: Extracting insights from images in research or design projects.

Chat Tools