spatie/crawler
PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.
The most common use case is to collect all URLs on a site. The foundUrls() method makes this easy:
use Spatie\Crawler\Crawler;
$urls = Crawler::create('https://example.com')
->internalOnly()
->depth(3)
->foundUrls();
This returns an array of CrawledUrl objects. Each CrawledUrl has these properties:
foreach ($urls as $crawledUrl) {
$crawledUrl->url; // string
$crawledUrl->status; // int (HTTP status code, or 0 if failed)
$crawledUrl->foundOnUrl; // ?string
$crawledUrl->depth; // int
$crawledUrl->resourceType; // ResourceType (link, image, script, etc.)
}
The resourceType property defaults to ResourceType::Link. When you use alsoExtract() or extractAll(), collected URLs will include the appropriate resource type for each discovered asset. See extracting resources for details.
Any observers or closure callbacks you've registered will still be called alongside the URL collection.
How can I help you explore Laravel packages today?