Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

Crawl all internal links found on a website

Frequently asked questions about Crawler
How do I integrate Spatie Crawler into a Laravel application for concurrent scraping?
Use `Crawler::create('url')->dispatch()` to queue crawl jobs in Laravel queues (e.g., Horizon). Configure concurrency with `->concurrency(10)` and leverage Laravel’s async workers for scalability. For real-time processing, hook into `onCrawled` callbacks to store results in Eloquent models or trigger events.
Can Spatie Crawler handle JavaScript-heavy sites like React or Angular SPAs?
Yes, enable Puppeteer/Chrome rendering via `->withBrowser()` or `->withPuppeteer()`. Requires Docker/Node.js setup for Chrome binary. Test with a sample of target pages to ensure stability, as complex SPAs may need adjustments like timeouts or headless Chrome flags.
What Laravel versions does Spatie Crawler support, and are there breaking changes?
Supports Laravel 8.x–10.x. Check the [changelog](https://github.com/spatie/crawler/blob/main/CHANGELOG.md) for version-specific updates. Breaking changes are rare but may affect Puppeteer or Guzzle dependencies. Always update dependencies incrementally in a staging environment.
How do I test crawl logic without hitting external APIs during development?
Use the `->fake()` method to mock responses with static HTML. Example: `->fake(['url' => '<html>...</html>'])`. This bypasses network requests entirely, making unit tests fast and reliable. Combine with Laravel’s `Http::fake()` for broader HTTP mocking.
What’s the best way to store scraped data in Laravel using Spatie Crawler?
Use `onCrawled` callbacks to save data to Eloquent models or Laravel’s filesystem/database. For large crawls, batch inserts with `DB::transaction()` or queue jobs to avoid timeouts. Validate scraped data with Laravel’s validation rules before storage.
How do I avoid rate limiting or IP bans when crawling large sites?
Respect `robots.txt` by checking `CrawlResponse::isRobotsTxtDisallowed()`. Add delays with `->crawlDelay(2)` (seconds) and rotate IPs using Laravel’s HTTP client middleware or packages like `spatie/proxy`. Monitor failed requests with `onFailed` callbacks.
Can I run Spatie Crawler in a distributed Laravel setup (e.g., multiple queue workers)?
Yes, but manage crawl state carefully. Avoid shared closures (e.g., `shouldStopCallback`) in distributed setups; use database-backed state (e.g., a `crawls` table) or Laravel’s cache. For large crawls, split URLs across workers using `->concurrency()` and queue batching.
What are the resource requirements for JavaScript rendering with Puppeteer?
Each Puppeteer instance consumes ~200MB RAM. For high-volume crawls, limit concurrency (e.g., `->concurrency(5)`) or use Docker to isolate Chrome instances. Monitor memory with `memory_get_usage()` or tools like Blackfire. Consider serverless options (e.g., AWS Lambda) for sporadic crawls.
How do I debug failed crawls or Puppeteer errors in production?
Use `onFailed` callbacks to log errors to Laravel Telescope or Monolog. For Puppeteer issues, check Chrome logs via `->withPuppeteerOptions(['args' => ['--log-level=debug']])`. Enable verbose Guzzle logging with `->withGuzzleOptions(['debug' => true])` during development.
Are there alternatives to Spatie Crawler for Laravel, and when should I choose them?
For simple HTML scraping, consider `symfony/dom-crawler` or `php-crawler/php-crawler`. For headless browsing, `spatie/browsershot` (Puppeteer-only) is lighter but lacks crawling features. Use Spatie Crawler if you need concurrent requests, depth control, and JS rendering in a Laravel-native way.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle
atriumphp/atrium
sandermuller/package-boost-laravel
sandermuller/boost-skills
redaxo/core
yusufgenc/filament-api-forge
l3aro/rating-star-for-filament
leek/filament-subtenant-scope
anil/file-picker
broqit/fields-ai