Transform Operations

Transform operations are meant to be used to process either document files or media files. You can:

  • parse a NON text document file
  • split text into chucks of the provided size.
  • process a media file for example to resize an image.

These operations are supposed to be followed by either an [Embedding] generate from text operation or by an [Embedding] generate from media operation.

Transform | Parse document

The [Transform] Parse document operation parse a document sent as payload either in base64 or binary format and optionally splits it into text chunks based on the provided size.

Transform Parse Document

How to Use

Add Document to Store

The [Transform] Parse document operation should be followed by [Transform] Chunk text and/or [Embedding] Generate from text operations.

Transform Parse Document Use Case

Input Fields

General Fields

  • Document Binary: The document represented in binary format.

  • Document Parser: The type of the document parser to be used to parse the document file. Currently, two document parser types are supported:

    • Multiformat document parser: Should be used for any type except txt.
      • Include metadata: [False (Default)] Whether to include metadata in the parsed document.
    • Text document parser: Should be used for any type of text files (json, xml, txt, csv, etc.)
      • Charset: [UTF-8 (Default)] The charset of the text file.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:transform-parse-document doc:name="[Transform] Parse document" 
                                     doc:id="28951baf-f388-4735-8da1-8388a6913b29" 
                                     config-ref="Transform_config">
    <ms-vectors:document-binary >
        <![CDATA[#[payload.parts.documentBinary.content]]]>
    </ms-vectors:document-binary>
    <ms-vectors:document-parser-parameters >
        <ms-vectors:multiformat-document-parser-parameters />
    </ms-vectors:document-parser-parameters>
</ms-vectors:transform-parse-document>

Output Fields

Payload

This operation responds with a String payload.

Attributes

  • StorageResponseAttributes:
    • N/A

Transform | Chunk text

The [Transform] Chunk text operation takes text content and splits it into smaller chunks based on the provided size parameters.

Transform Chunk Text

How to Use

Add Document to Store

The [Transform] Chunk text operation should be followed by an [Embedding] Generate from text operation.

Transform Chunk Text Use Case

Input Fields

General Fields

  • Text: The text content to be chunked.

Segmentation Fields

  • Max Segment Size (Characters): The segment size of the document to be split in.
  • Max Overlap Size (Characters): The overlap size of the segments to fine tune the similarity search.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:transform-chunk-text doc:name="[Transform] Chunk text" 
                                 doc:id="03fa98ce-319b-4aa2-b318-d953dfb17dd7" 
                                 maxOverlapSizeInChars="#[attributes.queryParams.maxOverlapSize]" 
                                 maxSegmentSizeInChar="#[attributes.queryParams.maxSegmentSize]">
    <ms-vectors:text >
        <![CDATA[#[payload.text]]]>
    </ms-vectors:text>
</ms-vectors:transform-chunk-text>

Output Fields

Payload

This operation responds with a Array<String> payload.

Attributes

  • StorageResponseAttributes:
    • N/A

Transform | Process Media

The [Transform] Process media operation processes the media (for example to resize an image) and prepares it for embedding generation.

Transform Process Media

How to Use

The [Transform] Process media operation can be followed by [Embedding] Generate from media operation. The output payload is ready to be used by the embedding operation without any transformation.

Add Media to Store

The [Transform] Process media operation processes media content and prepares it for embedding generation. The processed media can then be used by [Embedding] Generate from media to create vector embeddings.

Transform Process Media - Query from Store

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Transform Configuration set up in the Getting Started section.

General

  • Binary: The media binary to generate embeddings for.
  • Media Type: The type of the media. The default value is image.
  • Processor Settings:
    • Target Width (pixels): Contains the width of the image in pixels.
    • Target Height (pixels): Contains the height of the image in pixels.
    • Compression Quality: The compression quality for media (between 0.0 and 1.0, where 1.0 is highest quality).
    • Scale Strategy:
      • Fit (Default): Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is padded with a background color to fit the specified width and height.
      • Fill: Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is cropped to fill the target width and height.
      • Stretch: Resizes the image to fit within the specified width and height without maintaining the aspect ratio.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:transform-process-media doc:name="[Transform] Process media" 
                                    doc:id="d4755123-dcc2-41ce-b7f7-7f7c4c4a1679">
    <ms-vectors:binary >
        <![CDATA[#[payload.parts.image.content]]]>
    </ms-vectors:binary>
    <ms-vectors:media-processor-parameters >
        <ms-vectors:image-processor-parameters />
    </ms-vectors:media-processor-parameters>
</ms-vectors:transform-process-media>

Output Fields

Payload

This operation responds with a Binary payload.

Attributes

  • StorageResponseAttributes:
    • mediaType: The type of the media (eg. image).


© Copyright 2025 Salesforce, Inc. All rights reserved. Various trademarks held by their respective owners. Salesforce Tower, 415 Mission Street, 3rd Floor, San Francisco, CA 94105, United States.