Embedding Operations

Supported Model Providers

Azure OpenAI (opens in a new tab) (Microsoft): Managed access to OpenAI models through Azure's cloud platform.
Azure Vision AI (opens in a new tab) (Microsoft): Provides developers with access to advanced algorithms for processing images and returning information.
Einstein (opens in a new tab) (Salesforce): AI platform integrated into Salesforce for CRM automation and insights.
Google Vertex AI (opens in a new tab) (Google): Generative AI models hosted on Google's advanced, global infrastructure.
Hugging Face (opens in a new tab) (Hugging Face): Community-driven hub for machine learning models, datasets, and tools.
Mistral AI (opens in a new tab) (Mistral): Open-weight LLMs optimized for efficiency and customization in AI projects.
Nomic (opens in a new tab) (Nomic): Tools for understanding and visualizing large datasets with embeddings and AI.
Ollama (opens in a new tab) (Ollama): Platform offering tools and APIs for embedding-based search and AI-driven insights.
OpenAI (opens in a new tab) (OpenAI): Developer of advanced AI models like GPT and DALL·E for diverse applications.

Embedding | Generate from Text

The [Embedding] Generate from text operation optionally split the text into chunks of the provided size and creates numeric vectors for each text chunk.

How to Use

The [Embedding] Generate from text operation can be followed by either the [Store] Add or the [Store] Query operations. The output payload is ready to be used by both [Store] operations without any transformation.

Add Text to Store

When used in combination with [Store] Add operation the text along with the generated embeddings can be ingested into a vector store.

Generate Embeddings from Text - Add to Store

Query from Store

⚠️

When generating an embedding from text for query purposes do not provide any segmentation parameter. Leave blank Max Segment Size (Characters) and Max Overlap Size (Characters).

When used in combination with [Store] Query operation the provided text is first used to generated an embedding that is then used to perform a query against the vector store.

Generate Embeddings from Text - Query from Store

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Embedding Configuration set up in the Getting Started section.

General

Text: The text to generate embeddings for.

Segmentation Fields

Max Segment Size (Characters): The segment size of the document to be split in.
Max Overlap Size (Characters): The overlap size of the segments to fine tune the similarity search.

Embedding Model

Embedding Model Name: Indicates the embedding model to be used.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:embedding-generate-from-text
  doc:name="[Embedding] Generate from text"
  doc:id="92c7a561-7b99-4840-8ffb-f680c9e392dc"
  config-ref="MuleSoft_Vectors_Connector_Embedding_config"
  maxSegmentSizeInChar="3000"
  maxOverlapSizeInChars="300"
  embeddingModelName="sfdc_ai__DefaultOpenAITextEmbeddingAda_002">
  <ms-vectors:text ><![CDATA[#[payload.text]]]></ms-vectors:text>
</ms-vectors:embedding-generate-from-text>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
    "embeddings": [
      [-0.00683132, -0.0033572172, 0.02698761, -0.01291587, ...],
      [-0.0047172513, -0.03481483, 0.02046227, -0.037395656, ...],
      ...
    ]
    "text-segments": [
        {
            "metadata": {
                "index": "0"
            },
            "text": "In the modern world, technological advancements have become .",
        },
        {
            "metadata": {
                "index": "1"
            },
            "text": "E-commerce giants like Amazon and Alibaba have redefined ..",
        },
        ...
    ],
    "dimension": 1536
}

embeddings: The list of generated embeddings
- list-item (embedding)
text-segments: The list of segments.
- list-item (text-segment):
  - text: The text segment
  - metadata: The metadata key-value pairs.
    - index: The segment/chunk number for the uploaded data source.
dimension: The dimension of the selected embedding model.

Attributes

EmbeddingResponseAttributes:
- embeddingModelDimension: The dimension for the embedding model used.
- embeddingModelName: The embedding model name used.
- tokenUsage: The token usage for the embedding model used.
  - inputCount: The number of tokens used as input.
  - outputCount: The number of tokens used as output.
  - totalCount: The total number of tokens used.

Embedding | Generate from Document

The [Embedding] Generate from Document operation creates numeric vectors for provided document's text segments.

How to Use

Add Document to Store

The [Embedding] Generate from document operation can be preceded by either the [Document] Load single or the [Document] Load list operations and followed by [Store] Add operation to ingest the document into a vector store.

Generate Embeddings from Document - Add to Store

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Embedding Configuration set up in the Getting Started section.

General

Text Segments: The document's text segments to generate embeddings for. Typically the output of the [Document] Load single or [Document] Load list operations.

ℹ️

[Document] Load single output payload.

text-segments: The segments of the text of the document / file.
- list-item (text-segment):
  - text: The text segment
  - metadata: The metadata key-value pairs.
    - index: The segment/chunk number for the uploaded data source.
    - absolute_directory_path: The full path to the file which contains relevant text segment.
    - file_name: The name of the file, where the text segment was found.
    - full_path: The full path to the file.
    - file_Type: The file/source type.
    - source: File path set by cloud storage services (eg. Amazon S3)
    - url: Web page URL when processing file type url
    - title: Web page title

Embedding Model

Embedding Model Name: Indicates the embedding model to be used.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:embedding-generate-from-document
  doc:name="[Embedding] Generate from document"
  doc:id="5c20d635-8684-4022-927c-2410869e2e81"
  config-ref="MuleSoft_Vectors_Connector_Embedding_config"
  embeddingModelName="sfdc_ai__DefaultOpenAITextEmbeddingAda_002"/>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
    "embeddings": [
      [-0.00683132, -0.0033572172, 0.02698761, -0.01291587, ...],
      [-0.0047172513, -0.03481483, 0.02046227, -0.037395656, ...],
      ...
    ]
    "text-segments": [
        {
            "metadata": {
                "index": "0"
            },
            "text": "In the modern world, technological advancements have become .",
        },
        {
            "metadata": {
                "index": "1"
            },
            "text": "E-commerce giants like Amazon and Alibaba have redefined ..",
        },
        ...
    ],
    "dimension": 1536
}

embeddings: The list of generated embeddings
- list-item (embedding)
text-segments: The list of segments.
- list-item (text-segment):
  - text: The text segment
  - metadata: The metadata key-value pairs.
    - index: The segment/chunk number for the uploaded data source.
    - absolute_directory_path: The full path to the file which contains relevant text segment.
    - file_name: The name of the file, where the text segment was found.
    - full_path: The full path to the file.
    - file_Type: The file/source type.
    - source: File path set by cloud storage services (eg. Amazon S3)
    - url: Web page URL when processing file type url
    - title: Web page title
dimension: The dimension of the selected embedding model.

Attributes

EmbeddingResponseAttributes:
- embeddingModelDimension: The dimension for the embedding model used.
- embeddingModelName: The embedding model name used.
- tokenUsage: The token usage for the embedding model used.
  - inputCount: The number of tokens used as input.
  - outputCount: The number of tokens used as output.
  - totalCount: The total number of tokens used.

Embedding | Generate from Binary

The [Embedding] Generate from binary operation optionally process the media (for example to resize an image) and creates numeric vectors for it.

How to Use

The [Embedding] Generate from binary operation can be followed by either the [Store] Add or the [Store] Query operations. The output payload is ready to be used by both [Store] operations without any transformation.

Query from Store

When used in combination with [Store] Query operation the provided binary is first used to generated an embedding that is then used to perform a query against the vector store.

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Embedding Configuration set up in the Getting Started section.

General

Binary: The media binary to generate embeddings for.
Media Label: A short description/caption for the media.
Media Type: The type of the media. The default value is image.

Processor Settings:
- Target Width (pixels): Contains the width of the image in pixels.
- Target Height (pixels): Contains the height of the image in pixels.
- Compression Quality: The compression quality for media (between 0.0 and 1.0, where 1.0 is highest quality).
- Scale Strategy:
  - Fit (Default): Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is padded with a background color to fit the specified width and height.
  - Fill: Resizes the image to fit within the specified width and height while maintaining the aspect ratio. The image is cropped to fill the target width and height.
  - Stretch: Resizes the image to fit within the specified width and height without maintaining the aspect ratio.

Embedding Model

Embedding Model Name: Indicates the embedding model to be used.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:embedding-generate-from-binary
  doc:name="[Embedding] Generate from binary"
  doc:id="3d4a32b7-dd47-465a-bd93-0c4cb0af84c2"
  config-ref="Embedding_Config_Vertex_AI"
  embeddingModelName="multimodalembedding" >
    <ms-vectors:binary ><![CDATA[#[payload.parts.image.content]]]></ms-vectors:binary>
    <ms-vectors:label ><![CDATA[#[payload.parts.question.content]]]></ms-vectors:label>
    <ms-vectors:media-processor-parameters >
        <ms-vectors:image-processor-parameters />
    </ms-vectors:media-processor-parameters>
</ms-vectors:embedding-generate-from-binary>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
    "embeddings": [
      [-0.00683132, -0.0033572172, 0.02698761, -0.01291587, ...],
      [-0.0047172513, -0.03481483, 0.02046227, -0.037395656, ...],
      ...
    ]
    "text-segments": [
        {
            "metadata": {
                "index": 0
            },
            "text": "<The provided media label>"
        }
    ],
    "dimension": 1408
}

embeddings: The list of generated embeddings
- list-item (embedding)
text-segments: The list of segments.
- list-item (text-segment):
  - text: The media label.
  - metadata: The metadata key-value pairs.
    - index: The segment/chunk number for the uploaded data source.
dimension: The dimension of the selected embedding model.

Attributes

EmbeddingResponseAttributes:
- embeddingModelDimension: The dimension for the embedding model used.
- embeddingModelName: The embedding model name used.
- tokenUsage: The token usage for the embedding model used.
  - inputCount: The number of tokens used as input.
  - outputCount: The number of tokens used as output.
  - totalCount: The total number of tokens used.

Embedding | Generate from Media

The [Embedding] Generate from Media operation creates numeric vectors for provided media.

How to Use

Add Media to Store

The [Embedding] Generate from media operation can be preceded by either the [Media] Load single or the [Media] Load list operations and followed by [Store] Add operation to ingest the media into a vector store.

Generate Embeddings from Media - Add to Store

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Embedding Configuration set up in the Getting Started section.

General

Media: The media to generate embeddings for. Typically the output of the [Media] Load single or [Media] Load list operations.

ℹ️

[Media] Load single output payload.

metadata: The metadata key-value pairs.
- absolute_directory_path: The full path to the file which contains relevant text segment.
- source: File path set by cloud storage services (eg. Amazon S3)
- media_type: The type of the media (eg. image).
- mime_type: The media mime type.
- file_Type: The file/source type.
- file_name: The name of the media file.
base64Data: The base64 encoded media data.

Media Label: A short description/caption for the media.

Embedding Model

Embedding Model Name: Indicates the embedding model to be used.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:embedding-generate-from-media
  doc:name="[Embedding] Generate from media"
  doc:id="6139edc8-f378-446b-a71e-4498dd9698fb"
  config-ref="Embedding_Config_Vertex_AI"
  embeddingModelName="multimodalembedding">
    <ms-vectors:media ><![CDATA[#[payload]]]></ms-vectors:media>
    <ms-vectors:label ><![CDATA[An image of an oar]]></ms-vectors:label>
</ms-vectors:embedding-generate-from-media>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
    "embeddings": [
      [-0.00683132, -0.0033572172, 0.02698761, -0.01291587, ...],
      [-0.0047172513, -0.03481483, 0.02046227, -0.037395656, ...],
      ...
    ],
    "text-segments": [
        {
            "metadata": {
                "absolute_directory_path": "/Users/tbolis/Downloads",
                "media_type": "image",
                "mime_type": "image/png",
                "file_type": "png",
                "file_name": "4866963-200.png",
                "index": 0,
                "source": "file:///Users/tbolis/Downloads/4866963-200.png"
            },
            "text": "An image of an oar"
        }
    ],
    "dimension": 1408
}

embeddings: The list of generated embeddings
- list-item (embedding)
text-segments: The list of segments.
- list-item (text-segment):
  - text: The media label
  - metadata: The metadata key-value pairs.
    - absolute_directory_path: The full path to the file which contains relevant text segment.
    - source: File path set by cloud storage services (eg. Amazon S3)
    - media_type: The type of the media (eg. image).
    - mime_type: The media mime type.
    - file_Type: The file/source type.
    - file_name: The name of the media file.
dimension: The dimension of the selected embedding model.

Attributes

EmbeddingResponseAttributes:
- embeddingModelDimension: The dimension for the embedding model used.
- embeddingModelName: The embedding model name used.
- tokenUsage: The token usage for the embedding model used.
  - inputCount: The number of tokens used as input.
  - outputCount: The number of tokens used as output.
  - totalCount: The total number of tokens used.

Media Store