Product Decisions This Supports

Website Audits & SEO Optimization: Automate discovery of all internal links, broken links, or duplicate content for SEO audits or content migration projects.
Data Extraction & Scraping: Build scalable crawlers for extracting structured data from internal websites (e.g., product catalogs, documentation, or legacy systems).
Content Migration: Inventory all pages on a legacy system before migrating to a new platform (e.g., WordPress → Laravel).
Build vs. Buy: Avoid reinventing a crawler from scratch; leverage this package for internal tools where crawling is a core feature.
Roadmap for Analytics Tools: Integrate into a larger analytics platform to track page reach, link equity, or crawlability metrics.
Testing & Validation: Use the fake() method to unit-test crawlers without hitting production APIs or external services.

When to Consider This Package

Internal-Only Crawling: Ideal for crawling your own websites (e.g., staging, production, or legacy systems). Not designed for large-scale public web scraping (e.g., competitor sites).
JavaScript-Rendered Content: Required if your site relies on client-side rendering (SPAs, React, Vue, etc.).
Controlled Depth/Limit: Need to crawl only up to a certain depth (e.g., 3 levels deep) or limit the number of URLs.
Testing-First Development: Prefer writing tests with mocked responses before deploying real crawls.
Laravel/PHP Ecosystem: Already using Laravel or PHP; avoids context-switching to Python/Node.js tools.
Avoid Overkill: Don’t need advanced features like proxy rotation, CAPTCHA solving, or distributed crawling (consider Scrapy or Apify instead).

Look Elsewhere If:

You need to crawl external, large-scale public websites (rate limits, anti-bot measures).
You require distributed crawling (e.g., across multiple servers).
You need headless browser automation beyond crawling (e.g., form submission, screenshots).
Your budget allows for managed scraping services (e.g., ScraperAPI, Bright Data).

How to Pitch It (Stakeholders)

For Executives: "This package lets us build a self-service crawler for internal websites—think of it as a ‘Google for our own content.’ We can automate audits for broken links, duplicate pages, or SEO issues, saving QA teams weeks of manual work. It’s lightweight, runs in PHP (our stack), and can even handle JavaScript-heavy sites. For example, we could use it to inventory all pages before a major migration, or power a real-time content monitor. The cost? Zero—it’s open-source and maintained by a trusted vendor."

For Engineers: *"Spatie’s crawler is a batteries-included solution for PHP/Laravel projects needing to crawl internal sites. Key advantages:

Concurrent requests (via Guzzle) for speed.
JavaScript support (via Puppeteer/Chrome) for SPAs.
Fake mode for testing without hitting real APIs.
Observers & callbacks for extensibility (e.g., logging, analytics).
Depth/limit controls to avoid crawling too wide/deep.

It’s not a replacement for Scrapy or Scrapinghub, but it’s perfect for:

SEO audits (findUrls() + status code checks).
Data extraction (e.g., scraping product pages from a legacy system).
Pre-migration inventorying (e.g., ‘What pages exist on our old WordPress site?’).

Tradeoffs:

No built-in proxy/rotator (add spatie/ray or a custom layer).
Not for aggressive public scraping (use Apify or Scrapy instead).
Requires PHP/Laravel (but the team already uses it).

Example Use Case:

*‘We’re migrating our documentation from Confluence to Laravel. Before we lift-and-shift, we can use this crawler to:

Inventory all pages (including nested ones).

Check for broken links.

Extract metadata (titles, last updated) for our new CMS. All in a script we can run locally or in CI.’"

Crawler Laravel Package

Product Decisions This Supports

When to Consider This Package

How to Pitch It (Stakeholders)