[Document] parser
The Document parser
operation parse a document of type (text, pdf, url) and provide it as an output.
Input Fields
Module Configuration
This refers to the MAC Vector LLM Configuration set up in the Getting Started section.
General Operation Fields
- Context Path: Contains the full file path for the document to be ingested into the embedding store. Ensure the file path is accessible. You can also use a DataWeave expression for this field, e.g.,
mule.home ++ "/apps/" ++ app.name ++ "/customer-service.pdf"
.
Context Operation Field
- File Type: Contains the type of the document to be ingested into the embedding store. Currently, three file types are supported:
- text: Any type of text files (json, xml, txt, csv, etc.)
- pdf: Only system-generated
- url: Only single URL supported
Additional Properties
- Model Name: Indicates the embedding model to be used (default is
text-embedding-ada-002
).
XML Configuration
Below is the XML configuration for this operation:
<
vectors:document-parser
doc:name="Document parser"
doc:id="d0454666-014d-4e98-8178-8ce43cec469c"
config-ref="<YOUR_CONFIG>"
contextPath="#[payload.filepath]"
fileType="pdf"
/>
Output Field
This operation responds with a json
payload.
Example Output
This output has been converted to JSON.
{
"metadata": "Metadata { metadata = {absolute_directory_path=/Users/<user>/Documents/Downloads, file_name=Mule TXY Pro Product Description.pdf} }",
"contextPath": "/Users/<user>/Documents/Downloads/Mule TXY Pro Product Description.pdf",
"documentText": "\nProduct Name: Mule TXY Pro\n\nDescription:\n\nIntroducing Mule TXY Pro, the latest innovation in wireless audio technology. Engineered for\nthe ultimate auditory experience, Mule TXY Pro combines cutting-edge features with sleek,\nergonomic design to deliver unparalleled sound quality and convenience.\n\nKey Features:\n\n1. Adaptive Noise Cancellation: Mule TXY Pro features advanced adaptive noise\ncancellation technology that dynamically adjusts to your surroundings, ensuring an\nimmersive sound experience even in the noisiest environments.\n\n2. 360-Degree Spatial Audio: Experience music and media like never before with\n360-degree spatial audio, creating a theater-like soundscape that surrounds you in rich,\nmulti-dimensional sound.\n\n3. AI-Powered Smart Assist: Integrated AI assistants provide seamless control over your\nmusic, calls, and notifications. Voice-activated commands make hands-free operation a\nbreeze.\n\n4. Crystal Clear Call Quality: Equipped with multiple beamforming microphones, Mule\nTXY Pro ensures crystal clear voice calls by isolating your voice and reducing\nbackground noise.\n\n5. Extended Battery Life: Enjoy up to 12 hours of continuous playback on a single charge,\nwith the sleek charging case providing an additional 24 hours of battery life. Fast\ncharging capabilities mean you can get 3 hours of playback from just 15 minutes of\ncharging.\n\n6. Ergonomic Design: Mule TXY Pro's ergonomic design and customizable ear tips\nguarantee a secure and comfortable fit, making them perfect for extended wear during\nworkouts, travel, or daily commutes.\n\n7. Sweat and Water Resistant: With an IPX7 rating, Mule TXY Pro is built to withstand\nintense workouts and adverse weather conditions, making them the ideal companion for\nany activity.\n\n8. High-Fidelity Sound: Powered by graphene-enhanced drivers, Mule TXY Pro delivers\nstunning high-fidelity sound with deep bass, crisp mids, and sparkling trebles, ensuring\nevery note is heard with perfect clarity.\n\n9. Intuitive Touch Controls: Effortlessly control your music, manage calls, and activate\nvoice assistants with intuitive touch-sensitive controls on each earbud.\n\n10. Seamless Connectivity: Bluetooth 5.3 technology ensures a stable, low-latency\nconnection with your devices, while multi-device pairing lets you switch seamlessly\nbetween your phone, tablet, and laptop.\n\nWhy Mule TXY Pro is the Best:\n\n\n\n● Unmatched Sound Quality: The combination of graphene drivers and 360-degree\nspatial audio offers an audio experience that rivals high-end studio headphones.\n\n● Cutting-Edge Noise Cancellation: Adaptive noise cancellation provides a personalized\nlistening environment, making them ideal for frequent travelers and commuters.\n\n● Smart and Intuitive: AI-powered features and intuitive touch controls offer a level of\nconvenience and usability that is unmatched by other earpods.\n\n● Durability and Comfort: The IPX7 rating and ergonomic design ensure that Mule TXY\nPro is both durable and comfortable for all-day wear.\n\n● Exceptional Battery Life: Extended battery life with quick charging capabilities means\nyou spend less time charging and more time enjoying your music.\n\nMule TXY Pro: Elevate your auditory experience to new heights with the most advanced\nearpods on the market. Whether you're a music enthusiast, a busy professional, or an avid\ntraveler, Mule TXY Pro delivers the performance and convenience you need.\n\n\n",
"fileType": "any"
}
- metadata: The metadata for the file.
- contextPath: The file path of the parsed file.
- documentText: The parsed text of the document / file.
- fileType: The file type selected on the operation.
[Document] splitter
The Document splitter
operation splits a document into text segment defined on the operation.
Input Fields
Module Configuration
This refers to the MAC Vector LLM Configuration set up in the Getting Started section.
General Operation Fields
- Folder Path: Contains the path for the documents to be ingested into the embedding store. Ensure the file path is accessible. You can also use a DataWeave expression for this field, e.g.,
mule.home ++ "/apps/" ++ app.name ++ "/"
. - Max segment size: The segment size of the document to be splitted in.
- Max overlap size: The overlap size of the segments to fine tune the similarity search.
Context Operation Field
- File Type: Contains the type of the document to be ingested into the embedding store. Currently, three file types are supported:
- text: Any type of text files (json, xml, txt, csv, etc.)
- pdf: Only system-generated
Additional Properties
- Model Name: Indicates the embedding model to be used (default is
text-embedding-ada-002
).
XML Configuration
Below is the XML configuration for this operation:
<
vectors:document-parser
doc:name="Document parser"
doc:id="d0454666-014d-4e98-8178-8ce43cec469c"
config-ref="<YOUR_CONFIG>"
contextPath="#[payload.filepath]"
fileType="pdf"
/>
Output Field
This operation responds with a json
payload.
Example Output
This output has been converted to JSON.
{
"contextPath": "/Users/amir.khan/Documents/Downloads/mulechain.txt",
"fileType": "com.mule.mulechain.vectors.internal.helpers.fileTypeParameters@4a2d91d7",
"segments": "[TextSegment { text = \"MAC Project is an open source project to enable langchain like capabilities for the MuleSoft Anypoint Platform. The main solution within the MAC Project is the MuleSoft AI Chain Connector.\" metadata = {absolute_directory_path=/Users/amir.khan/Documents/Downloads, file_name=mulechain.txt, index=0} }, TextSegment { text = \"It unifies the interaction to LLM and vector stores and help to build agents and agent workflows in a no code manner.\" metadata = {absolute_directory_path=/Users/amir.khan/Documents/Downloads, file_name=mulechain.txt, index=1} }, TextSegment { text = \"The MAC Project consist of multiple solutions. While the project has started with MuleSoft AI Chain as MuleSoft Custom Connector, the vision evolved and it transitioned to a project.\" metadata = {absolute_directory_path=/Users/amir.khan/Documents/Downloads, file_name=mulechain.txt, index=62}}, TextSegment { text = \"You can speed up RAG implementation by referring to existing SageMaker notebooks and code examples.\" metadata = {absolute_directory_path=/Users/amir.khan/Documents/Downloads, file_name=mulechain.txt, index=63} }]"
}
- contextPath: The file path of the parsed file.
- segments: The segments of the text of the document / file.
- fileType: The file type selected on the operation.