Code Weaver

Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub

Deep Wiki

Context7

title: Custom link extraction weight: 2

You can customize how links are extracted from a page by creating a class that implements the UrlParser interface. The extractUrls method should return an array of ExtractedUrl objects:

use Spatie\Crawler\Enums\ResourceType;
use Spatie\Crawler\ExtractedUrl;
use Spatie\Crawler\UrlParsers\UrlParser;

class MyUrlParser implements UrlParser
{
    /** [@return](https://github.com/return) array<int, ExtractedUrl> */
    public function extractUrls(string $html, string $baseUrl): array
    {
        // parse the HTML and return an array of discovered URLs
        return [
            new ExtractedUrl(
                url: 'https://example.com/page',
                linkText: 'Example page',
                resourceType: ResourceType::Link,
            ),
        ];
    }
}

Each ExtractedUrl has the following properties:

url: the discovered URL
linkText: the text content of the link (optional)
resourceType: the type of resource (Link, Image, Script, Stylesheet, or OpenGraphImage)
malformedReason: if set, the URL is treated as malformed and will be skipped

By default, the LinkUrlParser is used. It extracts URLs from <a> tags, <link rel="next/prev">, and <link hreflang> elements. When resource extraction is enabled, it also extracts images, scripts, stylesheets, and Open Graph images.

To use your custom parser, pass it to the urlParser method:

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->urlParser(new MyUrlParser())
    ->start();

Crawling sitemaps

There is a built-in option to parse sitemaps instead of (or in addition to) following links. It supports sitemap index files.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->parseSitemaps()
    ->start();

Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.

Add packages to context

No packages found.

jayeshmepani/jpl-moshier-ephemeris-php

elnasnato/laraliveui

labrodev/rest-sdk

sampaui/sampaui

babelqueue/php-sdk

facebook/capi-param-builder-php

babelqueue/symfony

hamzi/corewatch

minionfactory/raw-hydrator

hexters/coinpayment

rjcodes/rjcms

act-training/laravel-permissions-manager

alimarchal/laravel-chart-of-accounts

babenkoivan/elastic-scout-driver

mkwebdesign/filament-watchdog-v5

renatomarinho/laravel-page-speed

zedmagdy/filament-business-hours

renatovdemoura/blade-elements-ui

devgeek/beacon-admin

benjamin-rqt/data-watcher-bundle