Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

title: Concurrency & throttling weight: 1

Concurrency

To improve the speed of the crawl, the package concurrently crawls 10 URLs by default. You can change this number using the concurrency method.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->concurrency(1) // crawl URLs one by one
    ->start();

Request delay

By default, there is no delay between requests. In some cases you might get rate limited when crawling too aggressively. You can add a pause between every request using the delay method. The value is expressed in milliseconds.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->delay(150) // wait 150ms after every page
    ->start();

Throttling

For more control over request pacing, you can use a throttle. A throttle is a class that implements Spatie\Crawler\Throttlers\Throttle. When a throttle is set, it takes precedence over the delay method.

Fixed delay

The FixedDelayThrottle works like delay(), but as a class you can pass around and configure independently.

use Spatie\Crawler\Crawler;
use Spatie\Crawler\Throttlers\FixedDelayThrottle;

Crawler::create('https://example.com')
    ->throttle(new FixedDelayThrottle(delayMs: 150))
    ->start();

Adaptive throttle

The AdaptiveThrottle adjusts the delay based on how fast the server responds. When the server is slow, the crawler backs off. When it speeds up, the delay decreases. You can configure minimum and maximum bounds.

use Spatie\Crawler\Crawler;
use Spatie\Crawler\Throttlers\AdaptiveThrottle;

Crawler::create('https://example.com')
    ->throttle(new AdaptiveThrottle(
        minDelayMs: 50,
        maxDelayMs: 5000,
    ))
    ->start();

The delay is calculated as an exponential moving average: (currentDelay + latency) / 2, clamped to the configured bounds.

Custom throttle

You can create your own throttle by implementing the Throttle interface:

use Spatie\Crawler\Throttlers\Throttle;

class MyThrottle implements Throttle
{
    public function sleep(): void
    {
        // Called after each response. Pause here.
    }

    public function recordResponseTime(float $seconds): void
    {
        // Called with the transfer time of each response.
    }
}

Default scheme

By default, URLs without a scheme are prefixed with https. You can change this using the defaultScheme method.

use Spatie\Crawler\Crawler;

Crawler::create('example.com')
    ->defaultScheme('http')
    ->start();
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle