Store Operations
Supported Vector Stores
- Azure AI Search (opens in a new tab) (Microsoft): Cloud-based AI-powered search with semantic search capabilities.
- Chroma (opens in a new tab) (Open Source): Open-source vector database for AI and embeddings management.
- Elasticsearch (opens in a new tab) (Elastic): Scalable search engine for structured/unstructured data and analytics.
- Milvus (opens in a new tab) (Zilliz): Vector database optimized for similarity search and AI workloads.
- Amazon OpenSearch (opens in a new tab) (Amazon Web Services): Managed search service for full-text, structured data queries.
- PGVector (opens in a new tab) (Open Source): PostgreSQL extension for storing and querying vector embeddings.
- Pinecone (opens in a new tab) (Pinecone): Scalable vector database with high-speed similarity search capabilities.
- Qdrant (opens in a new tab) (Qdrant): Vector database with advanced filtering for semantic search applications.
Store | Add
The [Store] Add
operation adds a document or text into an embedding store.
How to Use
Add Text to Store
The [Store] Add
operation can be preceded by the [Embedding] Generate from text
operation to ingest the text into
a vector store.
Add Document to Store
The [Store] Add
operation can be preceded by the [Document] Load single/list
and [Embedding] Generate from document
operations
to ingest the document into a vector store.
Input Fields
Module Configuration
This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.
General
- Store Name: The name of the collection in the external Vector Database.
- Text Segments and Embeddings: The texts segments and embeddings to be ingested into the Vector Database.
Typically the output of the
[Embedding] Generate from text
or[Embedding] Generate from document
operations.
[Embedding] Generate from document
output payload.
- embeddings: The list of generated embeddings
- list-item (embedding)
- text-segments: The list of segments.
- list-item (text-segment):
- text: The text segment
- metadata: The metadata key-value pairs.
- index: The segment/chunk number for the uploaded data source.
- absolute_directory_path: The full path to the file which contains relevant text segment.
- file_name: The name of the file, where the text segment was found.
- full_path: The full path to the file.
- file_Type: The file/source type.
- source: File path set by cloud storage services (eg. Amazon S3)
- url: Web page URL when processing file type url
- title: Web page title
- list-item (text-segment):
- dimension: The dimension of the selected embedding model.
XML Configuration
Below is the XML configuration for this operation:
<ms-vectors:store-add
doc:name="[Store] Add"
doc:id="7ca3df80-8cac-44dc-ad49-860a6f682d04"
config-ref="MuleSoft_Vectors_Connector_Store_config"
storeName="gettingstarted" />
Output Fields
Payload
This operation responds with a json
payload.
Example
Here an example of the JSON output.
{
"status": "updated"
}
- status: The status of the operation.
Attributes
- StoreResponseAttributes:
- storeName: The name of the vector store collection
- filter (Optional): Filter used to query or remove embeddings
-
- Metadata key: The metadata key used for filtering results.
- Filter method: The conditional operator to use for filtering.
- Metadata value: The metadata value to evaluate.
-
Store | List Sources
The [Store] List sources
operation list all sources into embedding store.
Input Fields
Module Configuration
This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.
General
- Store Name: The name of the vector collection in the Vector database.
Querying Strategy
- Embedding Page Size: Page size to use when querying the store.
XML Configuration
Below is the XML configuration for this operation:
<ms-vectors:store-list-sources
doc:name="[Store] List sources"
doc:id="4ba6854a-0580-46de-9c36-a4843abf6fb7"
config-ref="MuleSoft_Vectors_Connector_Store_config"
storeName="gettingStarted"
embeddingPageSize="5000"/>
Output Fields
Payload
This operation responds with a json
payload.
Example
Here an example of the JSON output.
{
"sourceCount": 3,
"sources": [
{
"absolute_directory_path": "/Users/tbolis/Downloads/RFP Docs/batch 1",
"file_name": "docs-accelerators__financial-services_1.11_modules_ROOT_pages_prerequisites.adoc",
"source_id": "d6d2e426-8da6-4454-a723-202e1bfb1114",
"full_path": "/Users/tbolis/Downloads/RFP Docs/batch 1/docs-accelerators__financial-services_1.11_modules_ROOT_pages_prerequisites.adoc",
"segmentCount": 1,
"ingestion_datetime": "2024-11-20T20:34:41.691Z",
"ingestion_timestamp": "1732134881691"
},
{
"absolute_directory_path": "/Users/tbolis/Downloads/RFP Docs/batch 1",
"file_name": "docs-accelerators__healthcare_2.20_modules_ROOT_pages_fhir-r4-us-core-profiles.adoc",
"source_id": "37789839-7685-46b5-bc39-6f47db3e2921",
"full_path": "/Users/tbolis/Downloads/RFP Docs/batch 1/docs-accelerators__healthcare_2.20_modules_ROOT_pages_fhir-r4-us-core-profiles.adoc",
"segmentCount": 3,
"ingestion_datetime": "2024-11-12T14:28:17.274Z",
"ingestion_timestamp": "1732134881691"
},
{
...
},
{
...
}
]
}
- sourceCount: The number of sources within the embedding store.
- sources: The list of sources within the embedding store.
- absolute_directory_path: The full path to the file which contains relevant text segment.
- file_name: The name of the file, where the text segment was found.
- source_id: The source UUID.
- full_path: The full path to the file.
- segmentCount: The number of segment/chunk the source is splitted into.
- ingestion_datetime: The ingestion date and time in ISO 8601 format (UTC)
- ingestion_timestamp: The ingestion time in milliseconds
Attributes
- StoreResponseAttributes:
- storeName: The name of the vector store collection
- filter (Optional): Filter used to query or remove embeddings
-
- Metadata key: The metadata key used for filtering results.
- Filter method: The conditional operator to use for filtering.
- Metadata value: The metadata value to evaluate.
-
Store | Query
The [Store] Query
operation retrieve information from the embedding store based on an embedding
(previously generated from a text prompt) and optionally a filter on metadata.
How to Use
When generating an embedding with the [Embedding] Generate from text
operation for query purposes,
do not provide any segmentation parameter. Leave blank Max Segment Size (Characters)
and
Max Overlap Size (Characters)
.
This operation can be used in combination with [Embedding] Generate from text
operation. The plain text to use when
querying the store is at first process by the [Embedding] Generate from text
operation that generates an embedding
that can be used to perform the actual query and represents the input fo the [Store] Query
operation.
Input Fields
Module Configuration
This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.
General
- Store Name: The name of the vector collection in the Vector database.
- Text Segment and Embedding: The text segment and embedding to use when querying the vector store.
Typically the output of the
[Embedding] Generate from text
operation.
embeddings
and text-segments
must have one and only one element.
To ensure this leave blank Max Segment Size (Characters)
and Max Overlap Size (Characters)
parameters.
[Embedding] Generate from text
output payload.
- embeddings: The list of generated embeddings
- list-item (embedding)
- text-segments: The list of segments.
- list-item (text-segment):
- text: The text segment
- metadata: The metadata key-value pairs.
- index: The segment/chunk number for the uploaded data source.
- list-item (text-segment):
- dimension: The dimension of the selected embedding model.
- Max results: The maximum number of results to query back. default (3).
- Min Score: The min score for the similarity search (0 - 1), default (0.8).
Filter
- Metadata key: The metadata key used for filtering results.
- Filter method: The conditional operator to use for filtering.
- Metadata value: The metadata value to evaluate.
XML Configuration
Below is the XML configuration for this operation:
<ms-vectors:query
doc:name="[Store] Query"
doc:id="b74a5c37-6ea9-42bf-907f-c27183007ec7"
config-ref="MuleSoft_Vectors_Connector_Store_config"
storeName="web_pages"
maxResults="5"
minScore="0.85"
metadataKey="url"
filterMethod="isEqualTo"
metadataValue="www.salesforce.com"/>
Output Fields
Payload
This operation responds with a json
payload.
Example
Here an example of the JSON output.
{
"question": "Tell me more about Cloudhub High Availability Feature",
"sources": [
{
"embeddingId": "",
"text": "= CloudHub High Availability Features\nifndef::env-site,env-github[]\ninclude::_attributes.adoc[]\nendif::[]\n:page-aliases: runtime-manager::cloudhub-fabric.adoc,\....\n\n== Worker Scale-out",
"score": 0.9282029356714594,
"metadata": {
"source_Id": "c426a871-1a6e-4a47-a8ab-027eec9303e1",
"index": "0"
"absolute_directory_path": "/Users/<user>/Documents/Downloads/patch 8",
"file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
"full_path": "/Users/<user>/Documents/Downloads/patch 8docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
"file_type": "any",
"ingestion_datetime": "2024-11-20T20:34:41.691Z",
"ingestion_timestamp": "1732134881691"
}
},
{
...
},
{
...
}
]
"response": "= CloudHub High Availability Features\.. (...) \..distributes HTTP requests among your assigned workers.\n. Persistent message queues (see below)",
"maxResults": 3,
"storeName": "gettingstarted",
"minimumScore": 0.7
}
- question: The question of the request.
- sources: The sources identified by the similarity search.
- embeddingId: The embedding UUID.
- text: The relevant text segment.
- score: The score of the similarity search based on the question.
- metadata: The metadata key-value pairs.
- source_id: The UUID for the uploaded data source.
- index: The segment/chunk number for the uploaded data source.
- absolute_directory_path: The full path to the file which contains relevant text segment.
- file_name: The name of the file, where the text segment was found.
- full_path: The full path to the file.
- file_Type: The file type
- ingestion_datetime: The ingestion date and time in ISO 8601 format (UTC)
- ingestion_timestamp: The ingestion time in milliseconds
- response: The collected response of all relevant text segment. This is the response will is sent to the LLM.
- maxResults: The maximum number of text segments considered.
- storeName: The name of the vector store.
- minimumScore: The minimum score for the result.
Attributes
- StoreResponseAttributes:
- storeName: The name of the vector store collection
- filter (Optional): Filter used to query or remove embeddings
-
- Metadata key: The metadata key used for filtering results.
- Filter method: The conditional operator to use for filtering.
- Metadata value: The metadata value to evaluate.
-
Example Use Cases
This operation can be particularly useful in scenarios such as:
- Knowledge Management Systems: Adding new documents to an organizational knowledge base.
- Customer Support: Storing customer interaction documents for quick retrieval and analysis.
- Content Management: Ingesting various types of documents (text, PDF, URL) into a centralized repository for easy access and searchability.
Store | Remove
The [Store] Remove
operation remove all embeddings from store based on a metadata filter.
Input Fields
Module Configuration
This refers to the MuleSoft Vectors Store Configuration set up in the Getting Started section.
General
- Store Name: The name of the collection in the Vector database.
Filter
- Metadata key: The metadata key used for filtering results.
- Filter method: The conditional operator to use for filtering.
- Metadata value: The metadata value to evaluate.
XML Configuration
Below is the XML configuration for this operation:
<vectors:store-remove
doc:name="Embedding remove documents by filter"
doc:id="c6b9ec97-1224-445e-ab02-f598d6fff7d7"
config-ref="MAC_Vectors_Config"
storeName="mulechaindemo"
metadataKey="file_name"
filterMethod="isEqualTo"
metadataValue="docs-accelerators__accelerators-cim_1.3_modules_ROOT_pages_cim-setup.adoc"
embeddingModelName="text-embedding-3-small"/>
Output Fields
Payload
This operation responds with a json
payload.
Example
Here an example of the JSON output.
{
"status": "deleted"
}
- status: The operation status.
Attributes
- StoreResponseAttributes:
- storeName: The name of the vector store collection
- filter (Optional): Filter used to query or remove embeddings
-
- Metadata key: The metadata key used for filtering results.
- Filter method: The conditional operator to use for filtering.
- Metadata value: The metadata value to evaluate.
-