Transform Operations
Transform operations are meant to be used to process either document files or media files. You can:
- parse a NON text document file
- split text into chucks of the provided size.
- process a media file for example to resize an image.
These operations are supposed to be followed by either an [Embedding] generate from text
operation or by an [Embedding] generate from media
operation.
Transform | Parse document
The [Transform] Parse document
operation parse a document sent as payload either in base64 or binary format and
optionally splits it into text chunks based on the provided size.

How to Use
Add Document to Store
The [Transform] Parse document
operation should be followed by [Transform] Chunk text
and/or [Embedding] Generate from text
operations.

Input Fields
General Fields
-
Document Binary: The document represented in binary format.
-
Document Parser: The type of the document parser to be used to parse the document file. Currently, two document parser types are supported:
- Multiformat document parser: Should be used for any type except txt.
- Include metadata: [False (Default)] Whether to include metadata in the parsed document.
- Text document parser: Should be used for any type of text files (json, xml, txt, csv, etc.)
- Charset: [UTF-8 (Default)] The charset of the text file.
- Multiformat document parser: Should be used for any type except txt.
XML Configuration
Below is the XML configuration for this operation:
<ms-vectors:transform-parse-document doc:name="[Transform] Parse document"
doc:id="28951baf-f388-4735-8da1-8388a6913b29"
config-ref="Transform_config">
<ms-vectors:document-binary >
<![CDATA[#[payload.parts.documentBinary.content]]]>
</ms-vectors:document-binary>
<ms-vectors:document-parser-parameters >
<ms-vectors:multiformat-document-parser-parameters />
</ms-vectors:document-parser-parameters>
</ms-vectors:transform-parse-document>
Output Fields
Payload
This operation responds with a String
payload.
Attributes
- StorageResponseAttributes:
- N/A
Transform | Chunk text
The [Transform] Chunk text
operation takes text content and splits it into smaller chunks based on the provided size parameters.

How to Use
Add Document to Store
The [Transform] Chunk text
operation should be followed by an [Embedding] Generate from text
operation.

Input Fields
General Fields
- Text: The text content to be chunked.
Segmentation Fields
- Max Segment Size (Characters): The segment size of the document to be split in.
- Max Overlap Size (Characters): The overlap size of the segments to fine tune the similarity search.
XML Configuration
Below is the XML configuration for this operation:
<ms-vectors:transform-chunk-text doc:name="[Transform] Chunk text"
doc:id="03fa98ce-319b-4aa2-b318-d953dfb17dd7"
maxOverlapSizeInChars="#[attributes.queryParams.maxOverlapSize]"
maxSegmentSizeInChar="#[attributes.queryParams.maxSegmentSize]">
<ms-vectors:text >
<![CDATA[#[payload.text]]]>
</ms-vectors:text>
</ms-vectors:transform-chunk-text>
Output Fields
Payload
This operation responds with a Array<String>
payload.
Attributes
- StorageResponseAttributes:
- N/A
Transform | Process Media
The [Transform] Process media
operation processes the media (for example to resize an image) and prepares it for embedding generation.

How to Use
The [Transform] Process media
operation can be followed by [Embedding] Generate from media
operation.
The output payload is ready to be used by the embedding operation without any transformation.
Add Media to Store
The [Transform] Process media
operation processes media content and prepares it for embedding generation. The processed media can then be used by [Embedding] Generate from media
to create vector embeddings.

Input Fields
Module Configuration
This refers to the MuleSoft Vectors Transform Configuration set up in the Getting Started section.
General
- Binary: The media binary to generate embeddings for.
- Media Type: The type of the media. The default value is
image
.
- Processor Settings:
- Target Width (pixels): Contains the width of the image in pixels.
- Target Height (pixels): Contains the height of the image in pixels.
- Compression Quality: The compression quality for media (between 0.0 and 1.0, where 1.0 is highest quality).
- Scale Strategy:
- Fit (Default): Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is padded with a background color to fit the specified width and height.
- Fill: Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is cropped to fill the target width and height.
- Stretch: Resizes the image to fit within the specified width and height without maintaining the aspect ratio.
XML Configuration
Below is the XML configuration for this operation:
<ms-vectors:transform-process-media doc:name="[Transform] Process media"
doc:id="d4755123-dcc2-41ce-b7f7-7f7c4c4a1679">
<ms-vectors:binary >
<![CDATA[#[payload.parts.image.content]]]>
</ms-vectors:binary>
<ms-vectors:media-processor-parameters >
<ms-vectors:image-processor-parameters />
</ms-vectors:media-processor-parameters>
</ms-vectors:transform-process-media>
Output Fields
Payload
This operation responds with a Binary
payload.
Attributes
- StorageResponseAttributes:
- mediaType: The type of the media (eg. image).