Docs
MAC WebCrawler
Crawl Website

Crawl Website

The Crawl website operation allows you to easily crawl for website content, at a specified depth. This operation allows you to additionally:

  • set a crawl delay so that you are not overloading the webserver with requests
  • download images from the crawled web pages during the crawl
Crawl a website for content

Input Fields

Module Configuration

This refers to the MAC Web Crawler Configuration set up in the Getting Started section.

Operation Fields

  • Website URL: The website to be crawled. Crawl will start from this URL, and by default, based on the specified Maximum Depth, any links found in pages that match the base-url will also be crawled.
  • Maximum Depth : Crawl will be limited to the specified maximum depth.
  • Delay (millisecs) : To prevent websites from being overloaded, you can add a delay to your crawl. This delay is the time delay between crawling pages on a website. Specify 0 for no delay.
  • Restrict Crawl under URL : If set to True, then the crawler will only crawl and fetch contents from those pages that match the specified Website URL
  • Retrieve Meta Tags : If set to True, then the crawler will also retrieve metadata from each crawled page, including title, description, keywords, and other SEO-related information that the page contains.
  • Download Images : If set to True, then the crawler will also download images found on each crawled page.
  • Download Location : The path where the crawler will download retrieved webpage content, including any images.

XML Configuration

Below is the XML configuration for this operation:

<
<mac-web-crawler:crawl-website 
doc:name="Crawl website" 
doc:id="0c0598a6-5ab7-4934-9b4c-fa3a36e545ff" 
config-ref="MAC_WebCrawler_Config" 
url="#[payload.url]" 
maxDepth="#[payload.maxDepth]" 
downloadPath='#[payload.path]' 
delayMillis="#[payload.delay]" 
restrictToPath="true"/>
/>

Output Field

This operation responds with a json payload.

Example Output

{
    "url": "https://mac-project.ai/docs",
    "children": [
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/showcase",
            "children": [],
            "fileName": "ExampleShowcases_20241022152911020.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-webcrawler/getting-started",
            "children": [],
            "fileName": "GettingStarted_20241022152911045.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/agent",
            "children": [],
            "fileName": "[Agent]DefinePromptTemplate_20241022152911072.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/sentiment-analysis",
            "children": [],
            "fileName": "[Sentiment]Analyzer_20241022152911098.json"
        },
        {
            "url": "https://mac-project.ai/docs/contribute",
            "children": [],
            "fileName": "Contribute_20241022152911127.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-webcrawler/supported-operations",
            "children": [],
            "fileName": "MACWebCrawlerConnectorOperations_20241022152911152.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/embedding",
            "children": [],
            "fileName": "[Embedding]Generatefromtext_20241022152911189.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/sentiment-analysis",
            "children": [],
            "fileName": "SentimentOperations_20241022152911218.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/supported-operations/embeddings",
            "children": [],
            "fileName": "[Embedding]Operations_20241022152911246.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai",
            "children": [],
            "fileName": "MACEinsteinAIConnector_20241022152911271.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-whisperer/supported-operations/speech",
            "children": [],
            "fileName": "[Speech]toText_20241022152911515.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/image-generation",
            "children": [],
            "fileName": "[Image]Generate_20241022152911765.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-whisperer/connector-overview",
            "children": [],
            "fileName": "MACWhispererConnectorOverview_20241022152911791.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/supported-operations",
            "children": [],
            "fileName": "MACEinsteinAIConnectorOperations_20241022152912422.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/supported-operations/chat",
            "children": [],
            "fileName": "[Chat]Operations_20241022152912669.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/connector-overview",
            "children": [],
            "fileName": "MuleSoftAIChain(MAC)ConnectorOverview_20241022152912954.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/tools",
            "children": [],
            "fileName": "ToolsOperations_20241022152913214.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/chat",
            "children": [],
            "fileName": "ChatOperations_20241022152913456.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-webcrawler/connector-overview",
            "children": [],
            "fileName": "MACWebCrawlerConnectorOverview_20241022152913493.json"
        },
        {
            "url": "https://mac-project.ai/docs",
            "children": [],
            "fileName": "Duplicate."
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/supported-operations/rag",
            "children": [],
            "fileName": "[RAG]Operations_20241022152913543.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/",
            "children": [],
            "fileName": "MuleSoftAIChainConnector_20241022152913902.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-vectors/supported-operations",
            "children": [],
            "fileName": "MACVectorsConnectorOperations_20241022152913929.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/agent",
            "children": [],
            "fileName": "AgentOperations_20241022152914173.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/getting-started",
            "children": [],
            "fileName": "GettingStarted_20241022152914422.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/getting-started",
            "children": [],
            "fileName": "GettingStarted_20241022152914668.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-whisperer/supported-operations",
            "children": [],
            "fileName": "MACWhispererConnectorOperations_20241022152914938.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations",
            "children": [],
            "fileName": "MuleSoftAIChainConnectorOperations_20241022152915197.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-vectors/connector-overview",
            "children": [],
            "fileName": "MACVectorsConnectorOverview_20241022152915441.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/supported-operations/agent",
            "children": [],
            "fileName": "[Agent]DefinePromptTemplate_20241022152915682.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai",
            "children": [],
            "fileName": "MuleSoftAIChainConnector_20241022152915707.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/platform",
            "children": [],
            "fileName": "[Agent]List_20241022152915977.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-vectors/getting-started",
            "children": [],
            "fileName": "GettingStarted_20241022152916005.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/supported-operations/chat",
            "children": [],
            "fileName": "[Chat]Answerprompt_20241022152916058.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/connector-overview",
            "children": [],
            "fileName": "AWSBedrockOverview_20241022152916302.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-whisperer/getting-started",
            "children": [],
            "fileName": "GettingStarted_20241022152916552.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-vectors/supported-operations/embeddings",
            "children": [],
            "fileName": "[Embedding]generatefromtext_20241022152916802.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/rag",
            "children": [],
            "fileName": "RAGOperations_20241022152917050.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-vectors/supported-operations/documents",
            "children": [],
            "fileName": "[Document]parser_20241022152917082.json"
        },
        {
            "url": "https://mac-project.ai/docs/aws-bedrock/getting-started",
            "children": [],
            "fileName": "GettingStarted_20241022152917110.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-vectors",
            "children": [],
            "fileName": "MACVectorsConnector_20241022152917138.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/image-generation",
            "children": [],
            "fileName": "ImageOperations_20241022152917392.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/",
            "children": [],
            "fileName": "MACEinsteinAIConnector_20241022152917696.json"
        },
        {
            "url": "https://mac-project.ai/docs/einstein-ai/connector-overview",
            "children": [],
            "fileName": "MACEinsteinAIConnectorOverview_20241022152917938.json"
        },
        {
            "url": "https://mac-project.ai/docs/mulechain-ai/supported-operations/embeddings",
            "children": [],
            "fileName": "EmbeddingOperations_20241022152918185.json"
        },
        {
            "url": "https://mac-project.ai/docs/mac-whisperer/supported-operations/text",
            "children": [],
            "fileName": "[Text]tospeech_20241022152918213.json"
        }
    ],
    "fileName": "Introduction_20241022152910742.json"
}