Document Operations

Document | Parser

The Document parser operation parse a document of type (text, pdf, url) and provide it as an output.

Embedding Add Document to Store

Input Fields

Module Configuration

This refers to the MAC Vectors Configuration set up in the Getting Started section.

General

  • Storage (Override Module Configuration): Based on the selected storage option you will be presented with the related required parameters
    • None: Selected when there is no need to define or to override the storage configuration at operation level. Note. When no storage configuration is defined at module and operation level, then the connector will behave as per Local configuration.
    • Expression or Bean reference: Allows to define the storage using a dataweave expression. This can be particularly helpful when there is the need of dynamically define the storage. More details on how to do it are available here
    • AWS S3: Allows to load data from AWS S3 Buckets
    • Azure Blob: Allows to load data from Azure Blob Storage
    • Local: Allows to load data from application local storage

Document Fields

  • File Type: Contains the type of the document to be ingested into the embedding store. Currently, three file types are supported:

    • any: Any type except txt, url or crawl
    • text: Any type of text files (json, xml, txt, csv, etc.)
    • url: Only a single URL supported.
    • crawl: The file type created by the webcrawler connector.
  • Context Path: Behaviour changes based on storage type.

    • Local: Contains the path for the documents to be ingested into the embedding store. Ensure the file path is accessible. You can also use a DataWeave expression for this field, e.g., mule.home ++ "/apps/" ++ app.name ++ "/".
    • AZURE_BLOB: Contains container name and blob item name in the following format <container-name>/<blob-item-name> (eg. ms-vectors-container/invoicesample.pdf, ms-vectors-container/folder/invoicesample.pdf, ...)
    • S3: Contains AWS S3 Bucket and AWS S3 Object Key in the following format s3://<s3-bucket>/<s3-object-key> (eg. s3://ms-vectors-bucket/setup.adoc, s3://ms-vectors-bucket/folder/setup.adoc,...)

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:document-parser
  doc:name="Document parser"
  doc:id="d0454666-014d-4e98-8178-8ce43cec469c"
  config-ref="MuleSoft_Vectors_Connector_Config"
  fileType="any"
  contextPath="#[payload.contextPath]">
    <ms-vectors:storage>
      <ms-vectors:local />
    </ms-vectors:storage>
</ms-vectors:document-parser>

Output Fields

Payload

This operation responds with a json payload.

Example

This output has been converted to JSON.

{
    "text": "\nProduct Name: Mule TXY Pro\n\nDescription:\n\nIntroducing Mule TXY Pro, the latest innovation in wireless audio technology. Engineered for\nthe ultimate auditory experience, Mule TXY Pro combines cutting-edge features with sleek,\nergonomic design to deliver unparalleled sound quality and convenience.\n\nKey Features:\n\n1. Adaptive Noise Cancellation: Mule TXY Pro features advanced adaptive noise\ncancellation technology that dynamically adjusts to your surroundings, ensuring an\nimmersive sound experience even in the noisiest environments.\n\n2. 360-Degree Spatial Audio: Experience music and media like never before with\n360-degree spatial audio, creating a theater-like soundscape that surrounds you in rich,\nmulti-dimensional sound.\n\n3. AI-Powered Smart Assist: Integrated AI assistants provide seamless control over your\nmusic, calls, and notifications. Voice-activated commands make hands-free operation a\nbreeze.\n\n4. Crystal Clear Call Quality: Equipped with multiple beamforming microphones, Mule\nTXY Pro ensures crystal clear voice calls by isolating your voice and reducing\nbackground noise.\n\n5. Extended Battery Life: Enjoy up to 12 hours of continuous playback on a single charge,\nwith the sleek charging case providing an additional 24 hours of battery life. Fast\ncharging capabilities mean you can get 3 hours of playback from just 15 minutes of\ncharging.\n\n6. Ergonomic Design: Mule TXY Pro's ergonomic design and customizable ear tips\nguarantee a secure and comfortable fit, making them perfect for extended wear during\nworkouts, travel, or daily commutes.\n\n7. Sweat and Water Resistant: With an IPX7 rating, Mule TXY Pro is built to withstand\nintense workouts and adverse weather conditions, making them the ideal companion for\nany activity.\n\n8. High-Fidelity Sound: Powered by graphene-enhanced drivers, Mule TXY Pro delivers\nstunning high-fidelity sound with deep bass, crisp mids, and sparkling trebles, ensuring\nevery note is heard with perfect clarity.\n\n9. Intuitive Touch Controls: Effortlessly control your music, manage calls, and activate\nvoice assistants with intuitive touch-sensitive controls on each earbud.\n\n10. Seamless Connectivity: Bluetooth 5.3 technology ensures a stable, low-latency\nconnection with your devices, while multi-device pairing lets you switch seamlessly\nbetween your phone, tablet, and laptop.\n\nWhy Mule TXY Pro is the Best:\n\n\n\n● Unmatched Sound Quality: The combination of graphene drivers and 360-degree\nspatial audio offers an audio experience that rivals high-end studio headphones.\n\n● Cutting-Edge Noise Cancellation: Adaptive noise cancellation provides a personalized\nlistening environment, making them ideal for frequent travelers and commuters.\n\n● Smart and Intuitive: AI-powered features and intuitive touch controls offer a level of\nconvenience and usability that is unmatched by other earpods.\n\n● Durability and Comfort: The IPX7 rating and ergonomic design ensure that Mule TXY\nPro is both durable and comfortable for all-day wear.\n\n● Exceptional Battery Life: Extended battery life with quick charging capabilities means\nyou spend less time charging and more time enjoying your music.\n\nMule TXY Pro: Elevate your auditory experience to new heights with the most advanced\nearpods on the market. Whether you're a music enthusiast, a busy professional, or an avid\ntraveler, Mule TXY Pro delivers the performance and convenience you need.\n\n\n"
}
  • text: The parsed text of the document / file.

Attributes

Example

Document | Splitter

The Document splitter operation splits a document into text segment defined on the operation.

Embedding Add Folder to Store

Input Fields

Module Configuration

This refers to the MAC Vectors Configuration set up in the Getting Started section.

General

  • Storage (Override Module Configuration): Based on the selected storage option you will be presented with the related required parameters
    • None: Selected when there is no need to define or to override the storage configuration at operation level. Note. When no storage configuration is defined at module and operation level, then the connector will behave as per Local configuration.
    • Expression or Bean reference: Allows to define the storage using a dataweave expression. This can be particularly helpful when there is the need of dynamically define the storage. More details on how to do it are available here
    • AWS S3: Allows to load data from AWS S3 Buckets
    • Azure Blob: Allows to load data from Azure Blob Storage
    • Local: Allows to load data from application local storage

Document Fields

  • File Type: Contains the type of the document to be ingested into the embedding store. Currently, three file types are supported:

    • any: Any type except txt, url or crawl
    • text: Any type of text files (json, xml, txt, csv, etc.)
    • url: Only a single URL supported.
    • crawl: The file type created by the webcrawler connector.
  • Context Path: Behaviour changes based on storage type.

    • Local: Contains the path for the documents to be ingested into the embedding store. Ensure the file path is accessible. You can also use a DataWeave expression for this field, e.g., mule.home ++ "/apps/" ++ app.name ++ "/".
    • AZURE_BLOB: Contains container name and blob item name in the following format <container-name>/<blob-item-name> (eg. ms-vectors-container/invoicesample.pdf, ms-vectors-container/folder/invoicesample.pdf, ...)
    • S3: Contains AWS S3 Bucket and AWS S3 Object Key in the following format s3://<s3-bucket>/<s3-object-key> (eg. s3://ms-vectors-bucket/setup.adoc, s3://ms-vectors-bucket/folder/setup.adoc,...)

Segmentation Fields

  • Max Segment Size (Characters): The segment size of the document to be split in.
  • Max Overlap Size (Characters): The overlap size of the segments to fine tune the similarity search.

XML Configuration

Below is the XML configuration for this operation:

<ms-vectors:document-split
  doc:name="Document parser"
  doc:id="d0454666-014d-4e98-8178-8ce43cec469c"
  config-ref="MuleSoft_Vectors_Connector_Config"
  storageType="Local"
  fileType="any"
  contextPath="#[payload.contextPath]"
  maxSegmentSizeInChar="3000"
  maxOverlapSizeInChars="300">
    <ms-vectors:storage>
      <ms-vectors:azure-blob
        azureName="${azureBlob.accountName}"
        azureKey="${azureBlob.accountKey}" />
    </ms-vectors:storage>
</ms-vectors:document-split>

Output Fields

Payload

This operation responds with a json payload.

Example

This output has been converted to JSON.

{
    "segments": [
        {
            "index": 0,
            "text": "= CIM Setup Subject Area\n\nDefines technical resources such as software components and document images.\n\n== Model diagrams\n\nThe following diagrams show the entities and relationships in this subject area, organized by entity group.\n\n=== Document\n\nimage:https://www.mulesoft.com/ext/solutions/images/cim1.3/ac..."
        },
        {
            "index": 1,
            ...
        },
        ...
    ]
}
  • segments: The segments of the text of the document / file.
    • index: The index of the segment
    • text: The text segment

Attributes

Example