Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

title: Custom link extraction weight: 2

You can customize how links are extracted from a page by creating a class that implements the UrlParser interface. The extractUrls method should return an array of ExtractedUrl objects:

use Spatie\Crawler\Enums\ResourceType;
use Spatie\Crawler\ExtractedUrl;
use Spatie\Crawler\UrlParsers\UrlParser;

class MyUrlParser implements UrlParser
{
    /** [@return](https://github.com/return) array<int, ExtractedUrl> */
    public function extractUrls(string $html, string $baseUrl): array
    {
        // parse the HTML and return an array of discovered URLs
        return [
            new ExtractedUrl(
                url: 'https://example.com/page',
                linkText: 'Example page',
                resourceType: ResourceType::Link,
            ),
        ];
    }
}

Each ExtractedUrl has the following properties:

  • url: the discovered URL
  • linkText: the text content of the link (optional)
  • resourceType: the type of resource (Link, Image, Script, Stylesheet, or OpenGraphImage)
  • malformedReason: if set, the URL is treated as malformed and will be skipped

By default, the LinkUrlParser is used. It extracts URLs from <a> tags, <link rel="next/prev">, and <link hreflang> elements. When resource extraction is enabled, it also extracts images, scripts, stylesheets, and Open Graph images.

To use your custom parser, pass it to the urlParser method:

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->urlParser(new MyUrlParser())
    ->start();

Crawling sitemaps

There is a built-in option to parse sitemaps instead of (or in addition to) following links. It supports sitemap index files.

use Spatie\Crawler\Crawler;

Crawler::create('https://example.com')
    ->parseSitemaps()
    ->start();
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle