Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

title: Crawl responses weight: 2

When a URL is successfully crawled, your callback or observer receives a CrawlResponse object. This provides a friendlier API than the raw PSR-7 response:

use Spatie\Crawler\CrawlResponse;

$response->status();        // int (HTTP status code)
$response->body();          // string (the response body, cached)
$response->header('Name');  // ?string (a single header value)
$response->headers();       // array (all headers)
$response->dom();           // Symfony DomCrawler instance
$response->isSuccessful();  // bool (2xx status)
$response->isRedirect();    // bool (3xx status)
$response->foundOnUrl();    // ?string (the URL where this link was found)
$response->linkText();      // ?string (the anchor text of the link)
$response->depth();         // int (how deep this page is from the start URL)
$response->resourceType();   // ResourceType (link, image, script, etc.)
$response->transferStats();  // ?Spatie\Crawler\TransferStatistics (transfer timing and metadata, null for faked responses)
$response->redirectHistory();// array (list of URLs in the redirect chain)
$response->wasRedirected();  // bool (whether the response went through any redirects)

If you need access to the underlying PSR-7 response:

$response->toPsrResponse(); // Psr\Http\Message\ResponseInterface

Transfer statistics

Each response includes a Spatie\Crawler\TransferStatistics object with typed accessors for timing data and other transfer details:

use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;

Crawler::create('https://example.com')
    ->onCrawled(function (string $url, CrawlResponse $response) {
        $stats = $response->transferStats();

        $stats->transferTimeInMs();   // ?float (total transfer time)
        $stats->effectiveUri();       // string (final URI after redirects)
    })
    ->start();

The transferStats() method returns null for faked responses.

All timing methods return values in milliseconds. They return null when the stat is unavailable (for example, tlsHandshakeTimeInMs() will be null for plain HTTP requests).

$stats = $response->transferStats();

// Timing (all in milliseconds)
$stats->transferTimeInMs();      // ?float (total transfer time)
$stats->connectionTimeInMs();    // ?float (TCP connection time)
$stats->dnsLookupTimeInMs();     // ?float (DNS resolution time)
$stats->tlsHandshakeTimeInMs();  // ?float (SSL/TLS handshake time)
$stats->timeToFirstByteInMs();   // ?float (time to first byte, TTFB)
$stats->redirectTimeInMs();      // ?float (time spent on redirects)

// Other
$stats->effectiveUri();                    // string (final URI after redirects)
$stats->primaryIp();                       // ?string (IP address of the server)
$stats->downloadSpeedInBytesPerSecond();   // ?float (average download speed)
$stats->requestSizeInBytes();              // ?int (size of the HTTP request)

Redirect history

When the crawler follows redirects (which is the default), you can inspect the redirect chain for any response:

use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;

Crawler::create('https://example.com')
    ->onCrawled(function (string $url, CrawlResponse $response) {
        if ($response->wasRedirected()) {
            echo "{$url} redirected through: " . implode(' → ', $response->redirectHistory()) . "\n";
        }
    })
    ->start();

The redirectHistory() method returns an array of URLs that were visited before reaching the final URL. The wasRedirected() method is a convenience that returns true when the redirect history is not empty.

The crawler follows up to 5 redirects per request by default (Guzzle's built-in limit), which protects against infinite redirect loops. To change this limit, pass a custom allow_redirects option:

use GuzzleHttp\RequestOptions;

Crawler::create('https://example.com', [
    RequestOptions::ALLOW_REDIRECTS => [
        'max' => 10,
        'track_redirects' => true,
    ],
])->start();

Keep track_redirects set to true if you want redirectHistory() and wasRedirected() to work. To disable following redirects entirely, set allow_redirects to false.

Using the DOM crawler

The dom() method returns a Symfony DomCrawler instance, which makes it easy to extract structured data from pages:

use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;

Crawler::create('https://example.com')
    ->onCrawled(function (string $url, CrawlResponse $response) {
        $title = $response->dom()->filter('title')->text('');
        $h1 = $response->dom()->filter('h1')->text('');

        echo "{$url}: {$title}\n";
    })
    ->start();
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle