Get Page Insights
Allows you to fetch insights from a webpage. This allows retrieve things like:
- word count on the page
- count on elements or tags such as H1, H2, DIV, P etc. You can also specify your own tags to retrieve insights by specifying these tags in the configuration of the operation.
- link structures (broken down into internal, external and reference)
- image links
Input Fields
Module Configuration
This refers to the MAC Web Crawler Configuration set up in the Getting Started section.
Operation Fields
- Page URL: The webpage to fetch the insights for.
XML Configuration
Below is the XML configuration for this operation:
<mac-web-crawler:get-page-insights
doc:name="Get page insights"
doc:id="08dd9627-5220-41df-a7cf-869bff4eee91"
config-ref="MAC_WebCrawler_Config"
url="#[payload.url]"/>
Output Field
This operation responds with a json
payload.
Example Output
{
"pageStats": {
"div": 36,
"p": 6,
"reference": 0,
"internal": 7,
"external": 2,
"images": 4,
"wordCount": 147,
"h1": 1,
"h2": 0,
"h3": 4,
"h4": 0,
"h5": 0
},
"links": {
"reference": [],
"internal": [
"https://www.mac-project.ai/",
"https://mac-project.ai/docs",
"https://mac-project.ai/",
"https://mac-project.ai/docs/mulechain-ai/getting-started",
"https://mac-project.ai/docs/contribute",
"https://mac-project.ai/docs/mulechain-ai/supported-operations",
"https://mac-project.ai/about"
],
"external": [
"https://www.linkedin.com/groups/13047000/",
"https://github.com/MuleSoft-AI-Chain-Project"
],
"images": [
"https://mac-project.ai/_next/image?url=%2Flogos%2Fmulechain-project-logo.png&w=96&q=75",
"https://mac-project.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcard-1.b6224663.png&w=3840&q=75",
"https://mac-project.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcard-operations.d3098f38.png&w=1920&q=75",
"https://mac-project.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcard-1.dark.fd8b5613.png&w=3840&q=75"
]
},
"title": "The MuleSoft AI Chain (MAC) Project",
"url": "https://mac-project.ai/"
}