spatie/crawler
Fast, concurrent web crawler for PHP. Crawl sites, collect internal URLs with depth limits, and hook into crawl events. Can execute JavaScript via Chrome/Puppeteer for rendered pages. Includes fakes for testing crawl logic without real HTTP requests.
Strengths:
fake() method simplifies unit/integration testing, critical for CI/CD pipelines.Weaknesses:
Crawler::create()->onCrawled(fn() => dispatch(new ProcessDataJob()))).php artisan crawl:seo).Crawled, CrawlFailed) for cross-service communication.CrawlResult table).robots.txt restrictions or IP bans. Mitigate via:
CrawlDelay directives.Http::withOptions()).max_jobs limits or chunked processing.urls_crawled, failures).spatie/crawler + laravel-queue-retries).robots.txt)? Ensure observers log compliance violations.app/Services/CrawlerService) with methods like crawlAndStore(), extractLinks().CrawlCommand for CLI execution (e.g., php artisan crawl:site --depth=3).CrawlStarted, DataExtracted) for decoupled processing.CrawlJob with handle() calling Crawler::start()).class CrawlJob implements ShouldQueue {
public function handle() {
Crawler::create($this->url)
->onCrawled(fn($url, $response) => $this->processResponse($response))
->start();
}
}
fake() for isolated tests (e.g., CrawlerTest class).Http::fake()) to test observer interactions.composer require spatie/crawler.fake() tests.crawl_results table).guzzlehttp/guzzle is compatible with your Laravel version.spatie/browsershot for headless setup.dom, fileinfo, and mbstring are required.Spatie\ or App\Crawler\).composer require spatie/browsershot
php artisan queue:work
fake() tests to validate logic.composer.json for stability.spatie/crawler’s onFinished to log FinishReason for debugging.Crawler::create($url)->setHttpClient(Http::withOptions(['debug
How can I help you explore Laravel packages today?