spatie/crawler
PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.
fake() method to unit-test crawlers without hitting production APIs or external services.Look Elsewhere If:
For Executives: "This package lets us build a self-service crawler for internal websites—think of it as a ‘Google for our own content.’ We can automate audits for broken links, duplicate pages, or SEO issues, saving QA teams weeks of manual work. It’s lightweight, runs in PHP (our stack), and can even handle JavaScript-heavy sites. For example, we could use it to inventory all pages before a major migration, or power a real-time content monitor. The cost? Zero—it’s open-source and maintained by a trusted vendor."
For Engineers: *"Spatie’s crawler is a batteries-included solution for PHP/Laravel projects needing to crawl internal sites. Key advantages:
It’s not a replacement for Scrapy or Scrapinghub, but it’s perfect for:
findUrls() + status code checks).Tradeoffs:
Example Use Case:
*‘We’re migrating our documentation from Confluence to Laravel. Before we lift-and-shift, we can use this crawler to:
- Inventory all pages (including nested ones).
- Check for broken links.
- Extract metadata (titles, last updated) for our new CMS. All in a script we can run locally or in CI.’"
How can I help you explore Laravel packages today?