How do I crawl a Laravel site to extract all internal URLs with depth limits?

Use the `internalOnly()` and `depth()` methods on the `Crawler` instance. For example, `Crawler::create('https://example.com')->internalOnly()->depth(3)->foundUrls()` will return all internal URLs up to 3 levels deep. This is useful for site mapping or SEO audits.

Can spatie/crawler handle JavaScript-rendered pages like React or Angular apps?

Yes, the crawler supports JavaScript execution via Chrome and Puppeteer (through Browsershot). Enable it by configuring the crawler to use the `Browsershot` driver, which renders pages before extracting links. This is ideal for modern SPAs or SSR applications.

What Laravel versions does spatie/crawler support?

The package is compatible with Laravel 8.x, 9.x, and 10.x. It follows Laravel’s semantic versioning, so minor updates typically align with Laravel’s release cycles. Always check the package’s `composer.json` for the latest supported versions.

How do I test crawl logic without hitting real APIs?

Use the `fake()` method to simulate HTTP responses. Pass an associative array where keys are URLs and values are HTML strings or responses. For example, `Crawler::create('https://example.com')->fake(['https://example.com' => ' ... '])->start()` lets you test callbacks without network requests.

Is spatie/crawler suitable for large-scale crawls (e.g., 10,000+ pages)?

Yes, but it requires configuration. Use Laravel’s queue system (e.g., `dispatch(new CrawlJob($url))`) to process URLs concurrently across workers. Monitor memory usage, as Puppeteer can be resource-intensive. For distributed crawls, consider chunking URLs or using serverless functions.

How do I integrate crawled data into Laravel’s database?

Use the `onCrawled` callback to process each response and persist data via Eloquent. For example, `Crawler::create('https://example.com')->onCrawled(fn($url, $response) => CrawledPage::create(['url' => $url, 'content' => $response->body()]))->start()`. This works well for SEO tools or content archives.

What’s the best way to avoid rate-limiting or IP bans while crawling?

Respect `robots.txt` and implement delays between requests using `Crawler::create()->delay(2)` (seconds). Rotate user agents with `Http::withOptions(['headers' => ['User-Agent' => '...']])` and consider proxy rotation for large crawls. Exponential backoff for failed requests is also recommended.

Can I use spatie/crawler in Laravel Artisan commands?

Absolutely. Create a custom Artisan command (e.g., `php artisan crawl:seo`) and instantiate the crawler inside it. This is useful for CLI-driven tasks like scheduled site audits or data extraction pipelines.

Are there alternatives to spatie/crawler for Laravel?

For HTTP-only crawling, consider `Guzzle` with custom logic or `Symfony Panther` (for browser automation). For simpler needs, `Laravel Scout` can index URLs via custom importers. However, spatie/crawler stands out for its Laravel integration, concurrency, and JavaScript support.

How do I handle dynamic content that breaks between crawls (e.g., session-dependent pages)?

Use Puppeteer’s headless Chrome to log in or set cookies before crawling. Pass authentication tokens via `Browsershot` configuration (e.g., `Browsershot::html($url)->setOption('auth', ['user', 'pass'])`). For session persistence, consider storing cookies and replaying them per request.

Weave Code

Code Weaver

Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.

Crawler Laravel Package

spatie/crawler

Fast, concurrent web crawler for PHP. Crawl sites, collect internal URLs with depth limits, and hook into crawl events. Can execute JavaScript via Chrome/Puppeteer for rendered pages. Includes fakes for testing crawl logic without real HTTP requests.

View on GitHub

Deep Wiki

Context7

spatie/crawler is a fast, flexible PHP web crawler for discovering and processing links on a site. It uses Guzzle promises to crawl multiple URLs concurrently, making it suitable for large link graphs and automated site audits.

Need to crawl modern frontends? The crawler can execute JavaScript-rendered pages via Chrome + Puppeteer (through Browsershot), and it also supports faking responses to test crawl logic without real HTTP requests.

Concurrent crawling with configurable throughput
Crawl callbacks to inspect each URL and response status/body
Filter to internal-only, set depth limits, and collect discovered URLs
JavaScript execution for SPA/SSR sites using headless Chrome
Built-in HTTP fake mode for deterministic testing

Frequently asked questions about Crawler

How do I crawl a Laravel site to extract all internal URLs with depth limits?: Use the `internalOnly()` and `depth()` methods on the `Crawler` instance. For example, `Crawler::create('https://example.com')->internalOnly()->depth(3)->foundUrls()` will return all internal URLs up to 3 levels deep. This is useful for site mapping or SEO audits.
Can spatie/crawler handle JavaScript-rendered pages like React or Angular apps?: Yes, the crawler supports JavaScript execution via Chrome and Puppeteer (through Browsershot). Enable it by configuring the crawler to use the `Browsershot` driver, which renders pages before extracting links. This is ideal for modern SPAs or SSR applications.
What Laravel versions does spatie/crawler support?: The package is compatible with Laravel 8.x, 9.x, and 10.x. It follows Laravel’s semantic versioning, so minor updates typically align with Laravel’s release cycles. Always check the package’s `composer.json` for the latest supported versions.
How do I test crawl logic without hitting real APIs?: Use the `fake()` method to simulate HTTP responses. Pass an associative array where keys are URLs and values are HTML strings or responses. For example, `Crawler::create('https://example.com')->fake(['https://example.com' => '<html>...</html>'])->start()` lets you test callbacks without network requests.
Is spatie/crawler suitable for large-scale crawls (e.g., 10,000+ pages)?: Yes, but it requires configuration. Use Laravel’s queue system (e.g., `dispatch(new CrawlJob($url))`) to process URLs concurrently across workers. Monitor memory usage, as Puppeteer can be resource-intensive. For distributed crawls, consider chunking URLs or using serverless functions.
How do I integrate crawled data into Laravel’s database?: Use the `onCrawled` callback to process each response and persist data via Eloquent. For example, `Crawler::create('https://example.com')->onCrawled(fn($url, $response) => CrawledPage::create(['url' => $url, 'content' => $response->body()]))->start()`. This works well for SEO tools or content archives.
What’s the best way to avoid rate-limiting or IP bans while crawling?: Respect `robots.txt` and implement delays between requests using `Crawler::create()->delay(2)` (seconds). Rotate user agents with `Http::withOptions(['headers' => ['User-Agent' => '...']])` and consider proxy rotation for large crawls. Exponential backoff for failed requests is also recommended.
Can I use spatie/crawler in Laravel Artisan commands?: Absolutely. Create a custom Artisan command (e.g., `php artisan crawl:seo`) and instantiate the crawler inside it. This is useful for CLI-driven tasks like scheduled site audits or data extraction pipelines.
Are there alternatives to spatie/crawler for Laravel?: For HTTP-only crawling, consider `Guzzle` with custom logic or `Symfony Panther` (for browser automation). For simpler needs, `Laravel Scout` can index URLs via custom importers. However, spatie/crawler stands out for its Laravel integration, concurrency, and JavaScript support.
How do I handle dynamic content that breaks between crawls (e.g., session-dependent pages)?: Use Puppeteer’s headless Chrome to log in or set cookies before crawling. Pass authentication tokens via `Browsershot` configuration (e.g., `Browsershot::html($url)->setOption('auth', ['user', 'pass'])`). For session persistence, consider storing cookies and replaying them per request.

View on GitHub

Stars

2,803

Favorites

2,801

Forks

368

Score

55.7

License

MIT

Last release

Mar 20, 2026

Watchers

Downloads

627K/mo

Dependents

Open issues

Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.

Add packages to context

No packages found.

davejamesmiller/laravel-breadcrumbs

artisanry/parsedown

christhompsontldr/phpsdk

enqueue/dsn

bunny/bunny

enqueue/test

enqueue/null

enqueue/amqp-tools

milesj/emojibase

bower-asset/punycode

bower-asset/inputmask

bower-asset/jquery

bower-asset/yii2-pjax

laravel/nova

spatie/laravel-mailcoach

spatie/laravel-superseeder

laravel/liferaft

nst/json-test-suite

danielmiessler/sec-lists

jackalope/jackalope-transport