Getting Started
Use the Connector in Your Project
Option 1: Maven Central Repository
Edit File pom.xml
Copy and paste the following Maven Dependency into your Mule application pom file.
<dependency>
<groupId>io.github.mulesoft-ai-chain-project</groupId>
<artifactId>mule4-webcrawler-connector</artifactId>
<version>{version}</version>
<classifier>mule-plugin</classifier>
</dependency>
Option 2: Local Maven Repository
System Requirements
Before you start, ensure you have the following prerequisites:
- Java Development Kit (JDK) 11 and 17
- Apache Maven
- MuleSoft Anypoint Studio
Download the MuleSoft WebCrawler Connector
Clone the MuleSoft WebCrawler Connector repository from GitHub:
git clone https://github.com/MuleSoft-AI-Chain-Project/mule-webcrawler-connector.git
cd mule-webcrawler-connector
Build the Connector with Java 11, 17, 21, 22, etc.
Step 1
export MAVEN_OPTS="--add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.util.regex=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.xml/javax.xml.namespace=ALL-UNNAMED"
Step 2
For Java 17
mvn clean install -Dmaven.test.skip=true -DskipTests -Dgpg.skip -Djdeps.multiRelease=17
For Java 21
mvn clean install -Dmaven.test.skip=true -DskipTests -Dgpg.skip -Djdeps.multiRelease=21
For Java 22
mvn clean install -Dmaven.test.skip=true -DskipTests -Dgpg.skip -Djdeps.multiRelease=22
The MAC Project connectors are constantly updated, and the version is regularly changed.
Make sure to replace {version}
with the latest release from our GitHub repository (opens in a new tab).
Add the following dependency to your pom.xml
file:
<dependency>
<groupId>com.mulesoft.connectors</groupId>
<artifactId>mule4-webcrawler-connector</artifactId>
<version>{version}</version>
<classifier>mule-plugin</classifier>
</dependency>
Connector Configuration
The configuration is applicable to the Crawl
and Page
operations
The configuration for the MuleSoft WebCrawler connector is simple to create.
Go to the Global Elements
in your MuleSoft project, and create a new configuration. In the Connector Configuration
, you will find the MuleSoft WebCrawler Connector Config. Select it and press OK.
data:image/s3,"s3://crabby-images/d30f5/d30f5d3ecd88903f34e7edd74522b54eaa452491" alt="WebCrawler Configuration"
Request Parameters
- User agent: The user agent to use for the request.
- Referrer : The referrer to use for the request (not set during dynamic content retrieval).
Crawler Settings
- Delay (millisec): The delay between page requests in milliseconds.
- Dynamic content retrieval : If enabled, the connector will retrieve dynamic content from the page.
- RAW html : If enabled, the connector will retrieve the raw HTML content of the page.