Crawl Website
The Crawl website
operation allows you to easily crawl for website content, at a specified depth. This operation allows you to additionally:
- set a crawl delay so that you are not overloading the webserver with requests
- download images from the crawled web pages during the crawl
Input Fields
Module Configuration
This refers to the MAC Web Crawler Configuration set up in the Getting Started section.
Operation Fields
- Website URL: The website to be crawled. Crawl will start from this URL, and by default, based on the specified Maximum Depth, any links found in pages that match the base-url will also be crawled.
- Maximum Depth : Crawl will be limited to the specified maximum depth.
- Delay (millisecs) : To prevent websites from being overloaded, you can add a delay to your crawl. This delay is the time delay between crawling pages on a website. Specify 0 for no delay.
- Restrict Crawl under URL : If set to True, then the crawler will only crawl and fetch contents from those pages that match the specified Website URL
- Retrieve Meta Tags : If set to True, then the crawler will also retrieve metadata from each crawled page, including title, description, keywords, and other SEO-related information that the page contains.
- Download Images : If set to True, then the crawler will also download images found on each crawled page.
- Download Location : The path where the crawler will download retrieved webpage content, including any images.
XML Configuration
Below is the XML configuration for this operation:
<
<mac-web-crawler:crawl-website
doc:name="Crawl website"
doc:id="0c0598a6-5ab7-4934-9b4c-fa3a36e545ff"
config-ref="MAC_WebCrawler_Config"
url="#[payload.url]"
maxDepth="#[payload.maxDepth]"
downloadPath='#[payload.path]'
delayMillis="#[payload.delay]"
restrictToPath="true"/>
/>
Output Field
This operation responds with a json
payload.
Example Output
{
"url": "https://mac-project.ai/docs",
"children": [
{
"url": "https://mac-project.ai/docs/mulechain-ai/showcase",
"children": [],
"fileName": "ExampleShowcases_20241022152911020.json"
},
{
"url": "https://mac-project.ai/docs/mac-webcrawler/getting-started",
"children": [],
"fileName": "GettingStarted_20241022152911045.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/agent",
"children": [],
"fileName": "[Agent]DefinePromptTemplate_20241022152911072.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/sentiment-analysis",
"children": [],
"fileName": "[Sentiment]Analyzer_20241022152911098.json"
},
{
"url": "https://mac-project.ai/docs/contribute",
"children": [],
"fileName": "Contribute_20241022152911127.json"
},
{
"url": "https://mac-project.ai/docs/mac-webcrawler/supported-operations",
"children": [],
"fileName": "MACWebCrawlerConnectorOperations_20241022152911152.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/embedding",
"children": [],
"fileName": "[Embedding]Generatefromtext_20241022152911189.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/sentiment-analysis",
"children": [],
"fileName": "SentimentOperations_20241022152911218.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/supported-operations/embeddings",
"children": [],
"fileName": "[Embedding]Operations_20241022152911246.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai",
"children": [],
"fileName": "MACEinsteinAIConnector_20241022152911271.json"
},
{
"url": "https://mac-project.ai/docs/mac-whisperer/supported-operations/speech",
"children": [],
"fileName": "[Speech]toText_20241022152911515.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/image-generation",
"children": [],
"fileName": "[Image]Generate_20241022152911765.json"
},
{
"url": "https://mac-project.ai/docs/mac-whisperer/connector-overview",
"children": [],
"fileName": "MACWhispererConnectorOverview_20241022152911791.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/supported-operations",
"children": [],
"fileName": "MACEinsteinAIConnectorOperations_20241022152912422.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/supported-operations/chat",
"children": [],
"fileName": "[Chat]Operations_20241022152912669.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/connector-overview",
"children": [],
"fileName": "MuleSoftAIChain(MAC)ConnectorOverview_20241022152912954.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/tools",
"children": [],
"fileName": "ToolsOperations_20241022152913214.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/chat",
"children": [],
"fileName": "ChatOperations_20241022152913456.json"
},
{
"url": "https://mac-project.ai/docs/mac-webcrawler/connector-overview",
"children": [],
"fileName": "MACWebCrawlerConnectorOverview_20241022152913493.json"
},
{
"url": "https://mac-project.ai/docs",
"children": [],
"fileName": "Duplicate."
},
{
"url": "https://mac-project.ai/docs/einstein-ai/supported-operations/rag",
"children": [],
"fileName": "[RAG]Operations_20241022152913543.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/",
"children": [],
"fileName": "MuleSoftAIChainConnector_20241022152913902.json"
},
{
"url": "https://mac-project.ai/docs/ms-vectors/supported-operations",
"children": [],
"fileName": "MACVectorsConnectorOperations_20241022152913929.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/agent",
"children": [],
"fileName": "AgentOperations_20241022152914173.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/getting-started",
"children": [],
"fileName": "GettingStarted_20241022152914422.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/getting-started",
"children": [],
"fileName": "GettingStarted_20241022152914668.json"
},
{
"url": "https://mac-project.ai/docs/mac-whisperer/supported-operations",
"children": [],
"fileName": "MACWhispererConnectorOperations_20241022152914938.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations",
"children": [],
"fileName": "MuleSoftAIChainConnectorOperations_20241022152915197.json"
},
{
"url": "https://mac-project.ai/docs/ms-vectors/connector-overview",
"children": [],
"fileName": "MACVectorsConnectorOverview_20241022152915441.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/supported-operations/agent",
"children": [],
"fileName": "[Agent]DefinePromptTemplate_20241022152915682.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai",
"children": [],
"fileName": "MuleSoftAIChainConnector_20241022152915707.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/platform",
"children": [],
"fileName": "[Agent]List_20241022152915977.json"
},
{
"url": "https://mac-project.ai/docs/ms-vectors/getting-started",
"children": [],
"fileName": "GettingStarted_20241022152916005.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/chat",
"children": [],
"fileName": "[Chat]Answerprompt_20241022152916058.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/connector-overview",
"children": [],
"fileName": "AWSBedrockOverview_20241022152916302.json"
},
{
"url": "https://mac-project.ai/docs/mac-whisperer/getting-started",
"children": [],
"fileName": "GettingStarted_20241022152916552.json"
},
{
"url": "https://mac-project.ai/docs/ms-vectors/supported-operations/embeddings",
"children": [],
"fileName": "[Embedding]generatefromtext_20241022152916802.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/rag",
"children": [],
"fileName": "RAGOperations_20241022152917050.json"
},
{
"url": "https://mac-project.ai/docs/ms-vectors/supported-operations/documents",
"children": [],
"fileName": "[Document]parser_20241022152917082.json"
},
{
"url": "https://mac-project.ai/docs/aws-bedrock/getting-started",
"children": [],
"fileName": "GettingStarted_20241022152917110.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-vectors",
"children": [],
"fileName": "MACVectorsConnector_20241022152917138.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/image-generation",
"children": [],
"fileName": "ImageOperations_20241022152917392.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/",
"children": [],
"fileName": "MACEinsteinAIConnector_20241022152917696.json"
},
{
"url": "https://mac-project.ai/docs/einstein-ai/connector-overview",
"children": [],
"fileName": "MACEinsteinAIConnectorOverview_20241022152917938.json"
},
{
"url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/embeddings",
"children": [],
"fileName": "EmbeddingOperations_20241022152918185.json"
},
{
"url": "https://mac-project.ai/docs/mac-whisperer/supported-operations/text",
"children": [],
"fileName": "[Text]tospeech_20241022152918213.json"
}
],
"fileName": "Introduction_20241022152910742.json"
}