Docs
MuleSoft WebCrawler
Getting Started

Getting Started

Use the Connector in Your Project

Option 1: Maven Central Repository

Maven Central (opens in a new tab)

Edit File pom.xml

Copy and paste the following Maven Dependency into your Mule application pom file.

pom.xml
<dependency>
   <groupId>io.github.mulesoft-ai-chain-project</groupId>
   <artifactId>mule4-webcrawler-connector</artifactId>
   <version>{version}</version>
   <classifier>mule-plugin</classifier>
</dependency>

Option 2: Local Maven Repository

System Requirements

Before you start, ensure you have the following prerequisites:

  • Java Development Kit (JDK) 11 and 17
  • Apache Maven
  • MuleSoft Anypoint Studio

Download the MuleSoft WebCrawler Connector

Clone the MuleSoft WebCrawler Connector repository from GitHub:

git clone https://github.com/MuleSoft-AI-Chain-Project/mule-webcrawler-connector.git
cd mule-webcrawler-connector

Build the Connector with Java 11, 17, 21, 22, etc.

Step 1

export MAVEN_OPTS="--add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.util.regex=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.xml/javax.xml.namespace=ALL-UNNAMED"

Step 2

 
For Java 17
mvn clean install -Dmaven.test.skip=true -DskipTests -Dgpg.skip -Djdeps.multiRelease=17
 
For Java 21
mvn clean install -Dmaven.test.skip=true -DskipTests -Dgpg.skip -Djdeps.multiRelease=21
 
For Java 22
mvn clean install -Dmaven.test.skip=true -DskipTests -Dgpg.skip -Djdeps.multiRelease=22
💡

The MAC Project connectors are constantly updated, and the version is regularly changed. Make sure to replace {version} with the latest release from our GitHub repository (opens in a new tab).

Add the following dependency to your pom.xml file:

pom.xml
<dependency>
    <groupId>com.mulesoft.connectors</groupId>
    <artifactId>mule4-webcrawler-connector</artifactId>
    <version>{version}</version>
    <classifier>mule-plugin</classifier>
</dependency>

Connector Configuration

The configuration is applicable to the Crawl website and Get page insights operations. The configuration for the MuleSOft WebCrawler connector is simple to create. Go to the Global Elements in your MuleSoft project, and create a new configuration. In the Connector Configuration, you will find the MuleSoft WebCrawler Configuration. Select it and press OK.

WebCrawler Configuration

If you wish to restrict content retrieval from specific elements or tags, then enter these in the Tag List as in the example below.

WebCrawler Configuration with Tags

In the example above, only text from HTML elements p.text-primary and h1.heading-primary will be retrieved (or analysed if using the Get page insights operation).