Docs
MAC WebCrawler
Get Page Insights

Get Page Insights

Allows you to fetch insights from a webpage. This allows retrieve things like:

  • word count on the page
  • count on elements or tags such as H1, H2, DIV, P etc. You can also specify your own tags to retrieve insights by specifying these tags in the configuration of the operation.
  • link structures (broken down into internal, external and reference)
  • image links
These insights can be used to build your own custom crawl by combining it the other operations provided by this connector!
Get insights of a webpage

Input Fields

Module Configuration

This refers to the MAC Web Crawler Configuration set up in the Getting Started section.

Operation Fields

  • Page URL: The webpage to fetch the insights for.

XML Configuration

Below is the XML configuration for this operation:

<mac-web-crawler:get-page-insights
doc:name="Get page insights"
doc:id="08dd9627-5220-41df-a7cf-869bff4eee91"
config-ref="MAC_WebCrawler_Config"
url="#[payload.url]"/>

Output Field

This operation responds with a json payload.

Example Output

{
    "pageStats": {
        "div": 36,
        "p": 6,
        "reference": 0,
        "internal": 7,
        "external": 2,
        "images": 4,
        "wordCount": 147,
        "h1": 1,
        "h2": 0,
        "h3": 4,
        "h4": 0,
        "h5": 0
    },
    "links": {
        "reference": [],
        "internal": [
            "https://www.mac-project.ai/",
            "https://mac-project.ai/docs",
            "https://mac-project.ai/",
            "https://mac-project.ai/docs/mulechain-ai/getting-started",
            "https://mac-project.ai/docs/contribute",
            "https://mac-project.ai/docs/mulechain-ai/supported-operations",
            "https://mac-project.ai/about"
        ],
        "external": [
            "https://www.linkedin.com/groups/13047000/",
            "https://github.com/MuleSoft-AI-Chain-Project"
        ],
        "images": [
            "https://mac-project.ai/_next/image?url=%2Flogos%2Fmulechain-project-logo.png&w=96&q=75",
            "https://mac-project.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcard-1.b6224663.png&w=3840&q=75",
            "https://mac-project.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcard-operations.d3098f38.png&w=1920&q=75",
            "https://mac-project.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcard-1.dark.fd8b5613.png&w=3840&q=75"
        ]
    },
    "title": "The MuleSoft AI Chain (MAC) Project",
    "url": "https://mac-project.ai/"
}