Transform Operations

Transform operations are meant to be used to process either document files or media files. You can:

parse a NON text document file
split text into chucks of the provided size.
process a media file for example to resize an image.

These operations are supposed to be followed by either an [Embedding] generate from text operation or by an [Embedding] generate from media operation.

Transform | Parse document

The [Transform] Parse document operation parse a document sent as payload either in base64 or binary format and optionally splits it into text chunks based on the provided size.

How to Use

Add Document to Store

The [Transform] Parse document operation should be followed by [Transform] Chunk text and/or [Embedding] Generate from text operations.

Input Fields

General Fields

Document Binary: The document represented in binary format.
Document Parser: The type of the document parser to be used to parse the document file. Currently, two document parser types are supported:
- Multiformat document parser: Should be used for any type except txt.
  - Include metadata: [False (Default)] Whether to include metadata in the parsed document.
- Text document parser: Should be used for any type of text files (json, xml, txt, csv, etc.)
  - Charset: [UTF-8 (Default)] The charset of the text file.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:transform-parse-document doc:name="[Transform] Parse document" 
                                     doc:id="28951baf-f388-4735-8da1-8388a6913b29" 
                                     config-ref="Transform_config">
    <ms-vectors:document-binary >
        <![CDATA[#[payload.parts.documentBinary.content]]]>
    </ms-vectors:document-binary>
    <ms-vectors:document-parser-parameters >
        <ms-vectors:multiformat-document-parser-parameters />
    </ms-vectors:document-parser-parameters>
</ms-vectors:transform-parse-document>

Output Fields

Payload

This operation responds with a String payload.

Attributes

StorageResponseAttributes:
- N/A

Transform | Chunk text

The [Transform] Chunk text operation takes text content and splits it into smaller chunks based on the provided size parameters.

How to Use

Add Document to Store

The [Transform] Chunk text operation should be followed by an [Embedding] Generate from text operation.

Input Fields

General Fields

Text: The text content to be chunked.

Segmentation Fields

Max Segment Size (Characters): The segment size of the document to be split in.
Max Overlap Size (Characters): The overlap size of the segments to fine tune the similarity search.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:transform-chunk-text doc:name="[Transform] Chunk text" 
                                 doc:id="03fa98ce-319b-4aa2-b318-d953dfb17dd7" 
                                 maxOverlapSizeInChars="#[attributes.queryParams.maxOverlapSize]" 
                                 maxSegmentSizeInChar="#[attributes.queryParams.maxSegmentSize]">
    <ms-vectors:text >
        <![CDATA[#[payload.text]]]>
    </ms-vectors:text>
</ms-vectors:transform-chunk-text>

Output Fields

Payload

This operation responds with a Array<String> payload.

Attributes

StorageResponseAttributes:
- N/A

Transform | Process Media

The [Transform] Process media operation processes the media (for example to resize an image) and prepares it for embedding generation.

How to Use

The [Transform] Process media operation can be followed by [Embedding] Generate from media operation. The output payload is ready to be used by the embedding operation without any transformation.

Add Media to Store

The [Transform] Process media operation processes media content and prepares it for embedding generation. The processed media can then be used by [Embedding] Generate from media to create vector embeddings.

Transform Process Media - Query from Store

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Transform Configuration set up in the Getting Started section.

General

Binary: The media binary to generate embeddings for.
Media Type: The type of the media. The default value is image.

Processor Settings:
- Target Width (pixels): Contains the width of the image in pixels.
- Target Height (pixels): Contains the height of the image in pixels.
- Compression Quality: The compression quality for media (between 0.0 and 1.0, where 1.0 is highest quality).
- Scale Strategy:
  - Fit (Default): Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is padded with a background color to fit the specified width and height.
  - Fill: Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is cropped to fill the target width and height.
  - Stretch: Resizes the image to fit within the specified width and height without maintaining the aspect ratio.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:transform-process-media doc:name="[Transform] Process media" 
                                    doc:id="d4755123-dcc2-41ce-b7f7-7f7c4c4a1679">
    <ms-vectors:binary >
        <![CDATA[#[payload.parts.image.content]]]>
    </ms-vectors:binary>
    <ms-vectors:media-processor-parameters >
        <ms-vectors:image-processor-parameters />
    </ms-vectors:media-processor-parameters>
</ms-vectors:transform-process-media>

Output Fields

Payload

This operation responds with a Binary payload.

Attributes

StorageResponseAttributes:
- mediaType: The type of the media (eg. image).

Storage Embedding