spatie/crawler
PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.
The crawler uses two handler classes to process Guzzle pool results: one for fulfilled requests and one for failed requests. You can replace these with your own subclasses to customize how responses and errors are processed.
Create a class that extends CrawlRequestFulfilled:
use Psr\Http\Message\ResponseInterface;
use Spatie\Crawler\Handlers\CrawlRequestFulfilled;
class MyFulfilledHandler extends CrawlRequestFulfilled
{
public function __invoke(ResponseInterface $response, mixed $index): void
{
// your custom logic here
parent::__invoke($response, $index);
}
}
Then pass it to the crawler:
use Spatie\Crawler\Crawler;
Crawler::create('https://example.com')
->fulfilledHandler(MyFulfilledHandler::class)
->start();
Create a class that extends CrawlRequestFailed:
use Exception;
use Spatie\Crawler\Handlers\CrawlRequestFailed;
class MyFailedHandler extends CrawlRequestFailed
{
public function __invoke(Exception $exception, mixed $index): void
{
// your custom logic here
parent::__invoke($exception, $index);
}
}
Then pass it to the crawler:
use Spatie\Crawler\Crawler;
Crawler::create('https://example.com')
->failedHandler(MyFailedHandler::class)
->start();
The class you pass must extend the base handler class. If it doesn't, an InvalidCrawlRequestHandler exception will be thrown.
How can I help you explore Laravel packages today?