Store Operations

ℹ️

Supported Vector Stores

Azure AI Search (opens in a new tab) (Microsoft): Cloud-based AI-powered search with semantic search capabilities.
AlloyDB (opens in a new tab) (Google): AlloyDB is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL.
Chroma (opens in a new tab) (Open Source): Open-source vector database for AI and embeddings management.
Elasticsearch (opens in a new tab) (Elastic): Scalable search engine for structured/unstructured data and analytics.
Milvus (opens in a new tab) (Zilliz): Vector database optimized for similarity search and AI workloads.
Amazon OpenSearch (opens in a new tab) (Amazon Web Services): Managed search service for full-text, structured data queries.
PGVector (opens in a new tab) (Open Source): PostgreSQL extension for storing and querying vector embeddings.
Pinecone (opens in a new tab) (Pinecone): Scalable vector database with high-speed similarity search capabilities.
Qdrant (opens in a new tab) (Qdrant): Vector database with advanced filtering for semantic search applications.
Weaviate (opens in a new tab) (Weaviate): Weaviate is an open-source, cloud-native vector database that enables storing, indexing, and searching high-dimensional data objects and vector embeddings for AI applications.
MongoDB Atlas (opens in a new tab) (MongoDB) MongoDB Atlas is a fully managed, multi-cloud database service built on MongoDB’s flexible document model, offering automated scaling, security, and integrated features like full-text and vector search to simplify and accelerate modern application development across AWS, Azure, and Google Cloud.
Ephemeral File (opens in a new tab) (LangChain4J): Ephemeral file based vector store. Implemented leveraging LangChain4j In-memory store implementation.

Store | Add

The [Store] Add operation adds a document or text into an embedding store.

How to Use

Add Text to Store

The [Store] Add operation can be preceded by the [Embedding] Generate from text operation to ingest the text into a vector store.

Add Document to Store

The [Store] Add operation can be preceded by the [Document] Load single/list and [Embedding] Generate from document operations to ingest the document into a vector store.

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.

General

Store Name: The name of the collection in the external Vector Database.
Text Segments and Embeddings: The texts segments and embeddings to be ingested into the Vector Database. Typically the output of the [Embedding] Generate from text or [Embedding] Generate from document operations.

ℹ️

[Embedding] Generate from document output payload.

embeddings: The list of generated embeddings
- list-item (embedding)
text-segments: The list of segments.
- list-item (text-segment):
  - text: The text segment
  - metadata: The metadata key-value pairs.
    - index: The segment/chunk number for the uploaded data source.
    - absolute_directory_path: The full path to the file which contains relevant text segment.
    - file_name: The name of the file, where the text segment was found.
    - full_path: The full path to the file.
    - file_Type: The file/source type.
    - source: File path set by cloud storage services (eg. Amazon S3)
    - url: Web page URL when processing file type url
    - title: Web page title
dimension: The dimension of the selected embedding model.

Custom Metadata

list-item (metadata entry):
- Key: The custom metadata key.
- Value: The custom metadata value.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:store-add
  doc:name="[Store] Add"
  doc:id="7ca3df80-8cac-44dc-ad49-860a6f682d04"
  config-ref="MuleSoft_Vectors_Connector_Store_config"
  storeName="gettingstarted" />

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
  "sourceId": "af44c7ef-4562-4712-af09-4498fc7f29a2",
  "embeddingIds": [
    "81f257c6-6406-4936-8c22-0ae523cce5fd",
    "2127ef9b-08f4-4bfc-b769-1f488cdbf835",
    "639e9994-f406-4481-a08a-0058ed3d781e"
  ],
  "status": "updated"
}

status: The status of the operation.

Attributes

StoreResponseAttributes:
- storeName: The name of the vector store collection

Store | Query

The [Store] Query operation retrieve information from the embedding store based on an embedding (previously generated from a text prompt) and optionally a filter on metadata.

How to Use

⚠️

When generating an embedding with the [Embedding] Generate from text operation for query purposes, do not provide any segmentation parameter. Leave blank Max Segment Size (Characters) and Max Overlap Size (Characters).

This ensures that the original prompt remains intact, preventing chunking that could lead to a search based on only a partial section of the user's request.

This operation can be used in combination with [Embedding] Generate from text operation. The plain text to use when querying the store is at first process by the [Embedding] Generate from text operation that generates an embedding that can be used to perform the actual query and represents the input fo the [Store] Query operation.

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.

General

Store Name: The name of the vector collection in the Vector database.
Text Segment and Embedding: The text segment and embedding to use when querying the vector store. Typically the output of the [Embedding] Generate from textoperation.

⚠️

embeddings and text-segments must have one and only one element. To ensure this leave blank Max Segment Size (Characters) and Max Overlap Size (Characters) parameters.

ℹ️

[Embedding] Generate from text output payload.

embeddings: The list of generated embeddings
- list-item (embedding)
text-segments: The list of segments.
- list-item (text-segment):
  - text: The text segment
  - metadata: The metadata key-value pairs.
    - index: The segment/chunk number for the uploaded data source.
dimension: The dimension of the selected embedding model.

Max results: The maximum number of results to query back. default (3).
Min Score: The min score for the similarity search (0 - 1), default (0.8).

Filter

Metadata Condition: The condition used for filtering results base on metadata.

ℹ️

It supports SQL like syntax.

Comparison operators are =, !=, <, <=, > and >=.
Special operators:
- CONTAINS(field_name, 'value') - Check if the field contains the value.
Logical operators are AND and OR.

Here an example: index=1 AND (CONTAINS(file_name,'example.pdf') OR file_type='any')

⚠️

CONTAINS(field_name, 'value') works most of the times same as field_name LIKE '%value%', but the behaviour may differ for each store provider.

For Example, for Azure AI Search it maps to search.ismatch('value', field_name).

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:query
  doc:name="[Store] Query"
  doc:id="b74a5c37-6ea9-42bf-907f-c27183007ec7"
  config-ref="MuleSoft_Vectors_Connector_Store_config"
  storeName="web_pages"
  maxResults="5"
  minScore="0.85"
  metadataKey="url"
  filterMethod="isEqualTo"
  metadataValue="www.salesforce.com"/>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
    "question": "Tell me more about Cloudhub High Availability Feature",
    "sources": [
        {
            "embeddingId": "",
            "text": "= CloudHub High Availability Features\nifndef::env-site,env-github[]\ninclude::_attributes.adoc[]\nendif::[]\n:page-aliases: runtime-manager::cloudhub-fabric.adoc,\....\n\n== Worker Scale-out",
            "score": 0.9282029356714594,
            "metadata": {
                "source_Id": "c426a871-1a6e-4a47-a8ab-027eec9303e1",
                "index": "0"
                "absolute_directory_path": "/Users/<user>/Documents/Downloads/patch 8",
                "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
                "full_path": "/Users/<user>/Documents/Downloads/patch 8docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
                "file_type": "any",
                "ingestion_datetime": "2024-11-20T20:34:41.691Z",
                "ingestion_timestamp": "1732134881691"
            }
        },
        {
          ...
        },
        {
          ...
        }
    ]
    "response": "= CloudHub High Availability Features\.. (...) \..distributes HTTP requests among your assigned workers.\n. Persistent message queues (see below)",
    "maxResults": 3,
    "storeName": "gettingstarted",
    "minimumScore": 0.7
}

question: The question of the request.
sources: The sources identified by the similarity search.
- embeddingId: The embedding UUID.
- text: The relevant text segment.
- score: The score of the similarity search based on the question.
- metadata: The metadata key-value pairs.
  - source_id: The UUID for the uploaded data source.
  - index: The segment/chunk number for the uploaded data source.
  - absolute_directory_path: The full path to the file which contains relevant text segment.
  - file_name: The name of the file, where the text segment was found.
  - full_path: The full path to the file.
  - file_Type: The file type
  - ingestion_datetime: The ingestion date and time in ISO 8601 format (UTC)
  - ingestion_timestamp: The ingestion time in milliseconds
response: The collected response of all relevant text segment. This is the response will is sent to the LLM.
maxResults: The maximum number of text segments considered.
storeName: The name of the vector store.
minimumScore: The minimum score for the result.

Attributes

StoreResponseAttributes:
- storeName: The name of the vector store collection
- metadataCondition (Optional): Filter condition used to query embeddings

Example Use Cases

This operation can be particularly useful in scenarios such as:

Knowledge Management Systems: Adding new documents to an organizational knowledge base.
Customer Support: Storing customer interaction documents for quick retrieval and analysis.
Content Management: Ingesting various types of documents (text, PDF, URL) into a centralized repository for easy access and searchability.

Store | Query All

The [Store] Query All operation list all sources into embedding store.

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.

General

Store Name: The name of the vector collection in the Vector database.

Query Parameters

Retrieve embeddings: If true retrieve embeddings from the store.

While querying the store along with embeddings, using Azure AI Search, the connector may return the following error:

Invalid expression: 'content_vector' is not a retrievable field. Only fields marked as retrievable in the index can be used in $select.\r\nParameter name: $select.

To solve the issue simply set content_vector as a retrievable field.

Page size: Page size to use when querying the store.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:query-all
  doc:name="[Store] Query All"
  doc:id="4ba6854a-0580-46de-9c36-a4843abf6fb7"
  config-ref="MuleSoft_Vectors_Connector_Store_config"
  storeName="gettingStarted"
  pageSize="5000"
  retrieveEmbeddings="false"/>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

[
  {
    "embeddingId": "81f257c6-6406-4936-8c22-0ae523cce5fd",
    "text": "E-commerce giants like Amazon and Alibaba have redefined ..",
    "metadata": {
        "index": "0",
        "source": "s3://ms-vectors/invoicesample.pdf",
        "file_type": "any",
        "file_name": "invoicesample.pdf"
        ...
    },
    "embeddings": [-0.00683132, -0.0033572172, 0.02698761, -0.01291587, ...]
  }
]

sourceCount: The number of sources within the embedding store.
sources: The list of sources within the embedding store.
- absolute_directory_path: The full path to the file which contains relevant text segment.
- file_name: The name of the file, where the text segment was found.
- source_id: The source UUID.
- full_path: The full path to the file.
- segmentCount: The number of segment/chunk the source is splitted into.
- ingestion_datetime: The ingestion date and time in ISO 8601 format (UTC)
- ingestion_timestamp: The ingestion time in milliseconds

Attributes

StoreResponseAttributes:
- storeName: The name of the vector store collection

Store | Remove

The [Store] Remove operation remove all embeddings from store based on a metadata filter.

Input Fields

Module Configuration

This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.

General

Store Name: The name of the collection in the Vector database.

Filter

Mutually Exclusive and Optional.

Ids: The list of ids to be deleted.
Metadata Condition: The condition used for filtering results base on metadata.

ℹ️

It supports SQL like syntax.

Comparison operators are =, !=, <, <=, > and >=.
Special operators:
- CONTAINS(field_name, 'value') - Check if the field contains the value. It works the same as field_name LIKE '%value%'.
Logical operators are AND and OR.

Here an example: index=1 AND (CONTAINS(file_name,'example.pdf') OR file_type='any')

XML Configuration

Below is the XML configuration for this operation:

<vectors:store-remove
  doc:name="Embedding remove documents by filter"
  doc:id="c6b9ec97-1224-445e-ab02-f598d6fff7d7"
  config-ref="MAC_Vectors_Config"
  storeName="mulechaindemo"
  metadataKey="file_name"
  filterMethod="isEqualTo"
  metadataValue="docs-accelerators__accelerators-cim_1.3_modules_ROOT_pages_cim-setup.adoc"
  embeddingModelName="text-embedding-3-small"/>

Output Fields

Payload

This operation responds with a json payload.

Example

Here an example of the JSON output.

{
    "status": "deleted"
}

status: The operation status.

Attributes

StoreResponseAttributes:
- storeName: The name of the vector store collection
- ids (Optional): Ids of the embeddings to be removed
- metadataCondition (Optional): Filter condition used to remove embeddings

Embedding Connector Overview