Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

title: Respecting robots.txt weight: 6

By default, the crawler will respect robots data from robots.txt files, meta tags, and response headers. More information on the spec can be found at robotstxt.org.

Parsing robots data is done by the spatie/robots-txt package.

Ignoring robots rules

You can disable all robots checks using the ignoreRobots method.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->ignoreRobots()
    ->start();

You can re-enable robots checking after disabling it using the respectRobots method.

$crawler = Crawler::create('https://example.com')
    ->ignoreRobots();

// later...
$crawler->respectRobots();

Accepting nofollow links

By default, the crawler will reject all links containing rel="nofollow". You can disable this check using the followNofollow method.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->followNofollow()
    ->start();

You can re-enable nofollow rejection using the rejectNofollowLinks method.

$crawler = Crawler::create('https://example.com')
    ->followNofollow();

// later...
$crawler->rejectNofollowLinks();

Custom user agent

The user agent is also used when checking robots.txt rules. When you set a custom user agent, robots.txt rules specific to that agent will be respected. For example, if your robots.txt contains:

User-agent: my-agent
Disallow: /

The crawler (when using my-agent as user agent) will not crawl any pages on the site.

Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle