spatie/crawler
PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.
When running the crawler as a long-lived CLI process, you may want to stop it cleanly with Ctrl+C (SIGINT) or SIGTERM instead of killing it mid-request.
The crawler automatically registers signal handlers when the pcntl extension is available. When a signal is received:
finishedCrawling() method on your observers is called with FinishReason::Interruptedstart() method returns FinishReason::Interrupteduse Spatie\Crawler\Crawler;
use Spatie\Crawler\Enums\FinishReason;
$reason = Crawler::create('https://example.com')
->start();
if ($reason === FinishReason::Interrupted) {
echo "Crawl was interrupted by a signal\n";
}
No configuration is needed. This works out of the box on any system where the pcntl PHP extension is loaded.
This is especially useful when combining with crawling across requests. A graceful shutdown ensures that the crawl queue remains in a consistent state, so you can resume crawling later.
How can I help you explore Laravel packages today?