Embedding Operations

Embedding | New Store

The Embedding new store operation creates a new in-memory embedding and exports it to a physical file. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

Embedding New Store

Input Configuration

Module Configuration

This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.

General Operation Fields

  • Store Name: Contains the full file path for the embedding store to be saved. If the file already exists, it will be overwritten. Ensure the file path is accessible. You can also use a DataWeave expression for this field, e.g., #[mule.home ++ "/apps/" ++ app.name ++ payload.storeName].

XML Configuration

Below is the XML configuration for this operation:

<ms-aichain:embedding-new-store 
  doc:name="Embedding new store" 
  doc:id="e3b52f5f-b765-4fad-9ecc-34755f386db4" 
  storeName='#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]'
/>

Output Configuration

Payload

This operation responds with a json payload containing the main LLM response.

Example Payload

{
    "status": "created"
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token usage:

{
  "storeName": "/Applications/AnypointStudio.app/Contents/../mule/apps/mac-knowledge-store-webcrawler/store"
}
 

Example Use Cases

  • Create a new in-memory vector store for storing document embeddings.
  • Export an in-memory embedding to a physical file for persistence across sessions.

Embedding | Add Document to Store

The Embedding add document to store operation adds a document into an embedding store and exports it to a file. The document is ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

Embedding Add Document to Store

Input Configuration

Module Configuration

This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.

General Operation Fields

  • Store Name: Contains the full file path for the embedding store to be saved. If the file already exists, it will be overwritten. Ensure the file path is accessible.
  • Context Path: Contains the full file path for the document to be ingested into the embedding store.
  • Max Segment Size: Specifies the maximum size, in tokens or characters, of each segment that the document will be split into before being ingested into the embedding store.
  • Max Overlap Size: Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.

Context Operation Field

  • File Type: The type of the document to be ingested into the embedding store. Supported types:
    • any: Automatically detects the file format and processes it accordingly. This option allows for flexibility when the file type is unknown or varied.
    • text: Text files, such as json, xml, txt, csv, etc.
    • url: A single URL pointing to web content that will be ingested.

XML Configuration

Below is the XML configuration for this operation:

<ms-aichain:embedding-add-document-to-store
  doc:name="Embedding add document to store"
  doc:id="e8b73dbe-c897-4f77-85d0-aaf59476c408"
  storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
  contextPath="#[payload.filePath]" 
  maxSegmentSizeInChars="#[payload.maxChunkSize]" 
  maxOverlapSizeInChars="#[payload.maxOverlapSize]" 
  fileType="#[payload.fileType]"
/>

Output Configuration

Payload

This operation responds with a json payload.

Example Payload

{
    "status": "updated"
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token usage:

{
  "filePath": "/Users/john.wick/Downloads/mulechain.txt",
  "storeName": "/Users/john.wick/Downloads/embedding.store",
  "fileType": "text"
}

Example Use Cases

  • Add a documents (e.g., PDFs, text files) into an embedding store for future retrieval.

Embedding | Add Folder to Store

The Embedding add folder to store operation adds a complete folder with subfolder files into an embedding store and exports it to a file. The documents are ingested into the embedding using in-memory embedding. The in-memory embedding in MuleSoft AI Chain persists its data through file exports upon any changes.

Embedding Add Folder to Store

Input Configuration

Module Configuration

This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.

General Operation Fields

  • Store Name: Contains the full file path for the embedding store to be saved. If the file already exists, it will be overwritten. Ensure the file path is accessible.
  • Context Path: Contains the full folder path to be used for ingesting into the embedding store.
  • Max Segment Size: Specifies the maximum size, in tokens or characters, of each segment that the document will be split into before being ingested into the embedding store.
  • Max Overlap Size: Defines the number of tokens or characters that will overlap between consecutive segments when splitting the document.

Context Operation Field

  • File Type: The type of the document to be ingested into the embedding store. Supported types:
    • any: Automatically detects the file format and processes it accordingly. This option allows for flexibility when the file type is unknown or varied.
    • text: Text files, such as json, xml, txt, csv, etc.
    • url: A single URL pointing to web content that will be ingested.

XML Configuration

Below is the XML configuration for this operation:

<ms-aichain:embedding-add-folder-to-store 
  doc:name="Embedding add folder to store" 
  doc:id="231a2afd-8cec-4a70-96c1-3ecef19d02db" 
  config-ref="MAC_AI_Llm_configuration" 
  storeName='#[mule.home ++ "/apps/" ++ app.name ++ "/knowledge-center.store"]' 
  folderPath="#[payload.folderPath]" 
/>

Output Configuration

Payload

This operation responds with a json payload.

Example Payload

{
    "status": "updated"
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token usage:

{
  "folderPath": "/Users/john.wick/Downloads/files",
  "filesCount": 3,
  "storeName": "/Users/john.wick/Downloads/embedding.store"
}

Example Use Cases

  • Add a folder of documents (e.g., PDFs, text files) into an embedding store for future retrieval.
  • Ingest documents from a specific directory into an in-memory embedding store for contextual analysis.

Embedding | Query from Store

The Embedding query from store operation retrieves information based on a plain text prompt using semantic search from an in-memory embedding store. This operation does not involve the use of a large language model (LLM). Instead, it directly searches the embedding store for relevant text segments based on the prompt. The embedding store is loaded into memory prior to retrieval.

Embedding Query from Store

Input Configuration

Module Configuration

This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.

General Operation Fields

  • Store Name: Contains the full file path for the embedding store to be saved. If the file already exists, it will be overwritten. Ensure the file path is accessible, e.g. #[mule.home ++ "/apps/" ++ app.name ++ payload.storeName].
  • Question: The plain text prompt to be sent to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.
  • Max Results: Specifies the maximum results to be returned with the query.
  • Min Score: Defines minimum score to be used to identify and return results with the query.
  • Get Latest: If true, the store file is loaded each time before running this operation, which might slow down performance. We recommend using this flag only when building the knowledge store. Once deployed with the Mule App, set it to false for better performance.

XML Configuration

Below is the XML configuration for this operation:

<ms-aichain:embedding-query-from-store 
  doc:name="Embedding query from store" 
  doc:id="1ee361ea-e62a-4e0f-9c74-0363f8721052" 
  storeName="#[mule.home ++ "/apps/" ++ app.name ++ payload.storeName]" 
  question="#[payload.question]" 
  maxResults="#[payload.maxResults]" 
  minScore="#[payload.minScore]" 
  getLatest="true"
/>

Output Configuration

Payload

This operation responds with a json payload containing the response coming from the knowledge base. Additionally, token usage and other metadata are returned as attributes.

Example Payload

{
  "response": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
  "sources": [
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "Networking Guide for more information on how to access an application in a specific CloudHub worker.",
          "individualScore": 0.7865373025380039,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      },
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "= CloudHub High Availability Features",
          "individualScore": 0.7845498154294348,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      },
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "textSegment": "[%header,cols=\"2*a\"]|===|VM Queues in On-Premises Applications |VM Queues in Applications deployed to CloudHub",
          "individualScore": 0.757268680397361,
          "file_name": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc"
      }
  ]
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token usage:

{
  "minScore": 0.7,
  "question": "Who is Amir",
  "maxResults": 3,
  "storeName": "/Users/john.wick/Downloads/embedding.store"
}

Example Use Cases

  • Query a vector store for specific information using semantic search.
  • Retrieve multiple relevant documents or text segments from an embedding store based on a given prompt.

Embedding | Get Info from Store

The Embedding get info from store operation retrieves information from an in-memory embedding store based on a plain text prompt. This operation does utilize a large language model (LLM) to enhance the response by interpreting the retrieved information and generating a more comprehensive or contextually enriched answer. The embedding store is loaded into memory prior to retrieval, and the LLM processes the results to refine the final response.

Embedding Get Info from Store

Input Configuration

Module Configuration

This refers to the MuleSoft AI Chain LLM Configuration set up in the Getting Started section.

General Operation Fields

  • Data: The plain text prompt to be sent to the in-memory vector store, which is converted into embedding and used for semantic search to find similar text segments.
  • Store Name: Contains the full file path for the embedding store to be saved. If the file already exists, it will be overwritten. Ensure the file path is accessible, e.g. #[mule.home ++ "/apps/" ++ app.name ++ payload.storeName].
  • Get Latest: If true, the store file is loaded each time before running this operation, which might slow down performance. We recommend using this flag only when building the knowledge store. Once deployed with the Mule App, set it to false for better performance.

XML Configuration

Below is the XML configuration for this operation:

<ms-aichain:embedding-get-info-from-store
    doc:name="Embedding get info from store"
    doc:id="913ed660-0b4a-488a-8931-26c599e859b5"
    config-ref="MuleSoft_AI_Chain_Config"
    storeName='#["/Users/john.wick/Desktop/mac-demo/stores/" ++ payload.storeName]'
    getLatest="true">
    <ms-aichain:data><![CDATA[#[payload.prompt]]]></ms-aichain:data>
</ms-aichain:embedding-get-info-from-store>

Output Configuration

Payload

This operation responds with a json payload containing the main LLM response. Additionally, token usage and other metadata are returned as attributes.

Example Payload

{
  "response": "Runtime Manager is a feature within CloudHub that provides scalability, workload distribution, and added reliability to applications.",
  "sources": [
      {
          "absoluteDirectoryPath": "/Users/john.wick/Documents/Downloads/patch 8",
          "fileName": "docs-runtime-manager__cloudhub_modules_ROOT_pages_cloudhub-fabric.adoc",
          "textSegment": "= CloudHub High Availability Features..."
      }
  ]
}

Attributes

Along with the JSON payload, the operation also returns attributes, which include information about token usage:

{
  "tokenUsage": {
      "outputCount": 89,
      "totalCount": 702,
      "inputCount": 613
  },
  "additionalAttributes": {
      "getLatest": "true",
      "question": "What is MuleChain",
      "storeName": "/Users/john.wick/Downloads/embedding.store"
}

Example Use Cases

  • Query a knowledge store with a plain text prompt and receive a refined response powered by an LLM.
  • Retrieve and interpret data from documents in an embedding store, with enhanced context from the LLM.