Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

title: Limits weight: 2

By default, the crawler continues until it has crawled every page it can find. This behavior might cause issues if you are working in an environment with limitations such as a serverless environment.

Crawl depth

You can limit how deep the crawler will go using the depth method.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->depth(2)
    ->start();

A depth of 0 means only the start URL will be crawled. A depth of 1 means the start URL and any pages it links to, and so on.

Crawl and time limits

The crawl behavior can be controlled with these options:

  • limit(): the maximum number of URLs to crawl across all executions
  • limitPerExecution(): how many URLs to process during the current crawl
  • timeLimit(): the maximum execution time in seconds across all executions
  • timeLimitPerExecution(): the maximum execution time in seconds for the current crawl

When any of these limits are reached, the crawler stops and returns a FinishReason from start(). See tracking progress for details.

Using the total crawl limit

The limit() method allows you to limit the total number of URLs to crawl, no matter how often you call the crawler.

use Spatie\Crawler\Crawler;
use Spatie\Crawler\Enums\FinishReason;

$queue = <your queue implementation>;

// Crawls 5 URLs and ends.
$reason = Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(5)
    ->start();

// $reason will be FinishReason::CrawlLimitReached

// Doesn't crawl further as the total limit is reached.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(5)
    ->start();

Using the current crawl limit

The limitPerExecution() method limits how many URLs will be crawled in a single execution. This is especially useful when crawling across multiple requests. This code will process 5 pages with each execution, without a total limit of pages to crawl.

use Spatie\Crawler\Crawler;

$queue = <your queue implementation>;

// Crawls 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limitPerExecution(5)
    ->start();

// Crawls the next 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limitPerExecution(5)
    ->start();

Using time limits

The timeLimit() method sets the maximum execution time across all executions. The timeLimitPerExecution() method sets the maximum execution time for a single crawl. Both accept a value in seconds.

use Spatie\Crawler\Crawler;

// Stop crawling after 60 seconds total
$reason = Crawler::create('https://example.com')
    ->timeLimit(60)
    ->start();

// $reason will be FinishReason::TimeLimitReached if time ran out

// Stop each execution after 30 seconds, but allow resuming
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->timeLimitPerExecution(30)
    ->start();

Combining limits

All limits can be combined to control the crawler:

use Spatie\Crawler\Crawler;

$queue = <your queue implementation>;

// Crawls 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(10)
    ->limitPerExecution(5)
    ->start();

// Crawls the next 5 URLs and ends.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(10)
    ->limitPerExecution(5)
    ->start();

// Doesn't crawl further as the total limit is reached.
Crawler::create('https://example.com')
    ->crawlQueue($queue)
    ->limit(10)
    ->limitPerExecution(5)
    ->start();
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle